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Kalman Filter Models for Network Forecasting 


By C. D. PACK and B. A. WHITAKER 


(Manuscript received December 31, 1980) 


The Bell System has recently completed studies that are expected 
to result in substantially improved forecasts for use in network 
planning. These improved forecasts are achieved through the use of 
new forecasting algorithms that employ Kalman filter models. To 
motivate the selection of Kalman filter forecasting procedures, we 
describe the Bell System’s special data characteristics and processing 
requirements in the network planning process. We also discuss the 
Kalman filter models, their statistical properties, the model identift- 
cation process, and certain implementation considerations. 


|. INTRODUCTION 


Projections of message circult usage (as measured in hundred call 
seconds or ccs), from which message circuit requirements (trunks) are 
developed, and special services* circuit demand are fundamental parts 
of the Bell System’s network planning and provisioning process. An 
overview of the information flows in this process is shown in Fig. 1; 
more details are given in Refs. 1 to 4 and in the companion forecasting 
papers in this issue. 

Since these projections strongly influence the allocations of several 
billions of construction dollars annually, it is important that they 
possess high quality statistical properties. For example, the projections 
should be unbiased; that is, the forecasts should not be consistently 


* Special services is a generic term referring to all Bell System services other than 
ordinary message telephone service. Examples include foreign exchange lines, tie lines, 
and private lines. 
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Fig. 1—Network planning and provisioning process. 


high or low. A biased forecast will result in either over expenditures or 
equipment shortages, depending on the sign of the bias. Also, the 
forecasts should be stable, cr precise, never varying too much from the 
realized true requirements. Previous studies have shown that highly 
variable forecasts result in increased reserve capacity requirements 
necessary to meet customer demands.” 

The service and economic motivations for high quality forecasts of 
message trunks and special services circuits have led the Bell System 
to reevaluate the existing projection algorithms used in these processes 
and to recommend improved methods, as necessary. It should be noted 
that a further motivation for this reevaluation is that most existing 
projection methods utilize curve fitting and extrapolation algorithms 
that were originally designed for manual calculations and that were 
available prior to the widespread use of computers and the advent of 
modern estimation techniques.” 

The possible approaches to improved projection methods were in- 
fluenced strongly by the particular characteristics of the time series 
and by the typical mode in which the forecasts are produced. 

Two factors were crucial in the selection of forecasting models and 
algorithms: the need to produce a large number of forecasts in a fully 
mechanized system and the typical dearth of data for any individual 
series. | 

A large number of forecasts is necessary because trunk and special 
services circuit requirements must be forecast at least once per year 
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for each trunk group and special services circuit group on record. Thus, 
there may be up to 100,000 trunk group time series and tens of 
thousands of special services circuit time series forecast in each run of 
the mechanized forecasting systems illustrated in Fig. 1. Therefore, 
these procedures must be fully mechanized. Moreover, for ease of 
comprehension and for computational efficiency, the algorithms must 
be simple to program and maintain. 

The sparsity of data is perhaps the more restrictive of the two key 
considerations in selecting new projection algorithms. Typically, the 
individual time series have between 1 and 10 years of relevant data, 
with 3 to 5 years being common. The fact that a projection may have 
to be based on less than 5 years of data (up to 60 points if monthly 
data were available), make the use of many approaches, including the 
popular Box-Jenkins ARIMA (autoregressive integrated moving av- 
erage) models® that require more than 100 data points, infeasible. 

The linear Kalman filter, in its most general form, was derived by 
Kalman’ and Kalman and Bucy’ in the early 1960s. The motivation 
for their work had its roots in various control theory applications. Of 
particular note was the implementation of the algorithms in tracking 
systems for the aerospace industry. In recent years, the models have 
also been used in power systems, process control, and forecasting. 

In Section III, we describe the general Kalman filter linear model, 
analyze and interpret the key matrices of the formulation, and relate 
the Kalman filter to other common forecasting models. In Section IV, 
we discuss various implementation considerations that are necessary 
to reduce the Kalman filter algorithm to practice. In Section V, we 
indicate methods for evaluating the effectiveness of the Kalman filter 
models and discuss the importance of certain key statistics. Finally, in 
Section VI, we summarize our conclusions concerning the use of 
Kalman filters for forecasting in the Bell System. 

Three companion papers to this overview follow. These papers 
describe the successful use of Kalman filter models for three Bell 
System forecasting applications—two concern message trunk forecast- 
ing and one covers special services demand forecasting. 

The first trunk forecasting paper, by J. P. Moreland, describes a 
simple linear, two-state model for use in projecting busy season (yearly 
peak) trunk group loads.” The second paper, by A. Ionescu-Graff, 
describes a linear, two-state Kalman filter with an absorbing barrier 
that can improve the quality of special services demand forecasts." 
The third paper, by C. R. Szelag, derives a Kalman filter for traffic 
that has not only linear growth, but also a seasonal.within-year pattern 
as well.!° That is, Szelag illustrates how, for trunk group load patterns 
exhibiting seasonality, nonbusy season data can be used to update and 
improve estimates of imminent busy season loads. 
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ll. KALMAN FILTER DESCRIPTION 


The Kalman filter, described in more detail in Section III, has many 
desirable properties. Most of these properties are not unique to the 
Kalman filter; however, because of its generality and particular form, 
as well as statistical properties, computational characteristics, and 
robust qualities it should be considered in most estimation applica- 
tions. 


2.1 Models 


The models used are based on state-space representations of the 
variables being estimated. The state-space formulation implies that, 
at each point in time, the process being modeled is described by a 
vector of state variables that summarize all relevant quantities of 
interest. In most instances, the state variables have physical interpre- 
tations, such as trunk quantities, growth rates, and so on. Then, a 
further characterization of the model specifies how this state vector 
evolves over time. 

The Kalman filter algorithm uses this model of the time behavior of 
the system along with “noisy” observations or measurements of some 
system variables to produce optimal estimates of all state variables. 
These estimates are then used in the process model to determine state 
estimates for future time periods. 

The distinction between the state-space models and the ARIMA 
models is mostly in the model identification process. That is, while the 
interpretation of the state-space models permits the user to choose a 
model based on physical properties (and hence when only limited data 
are available), the “time series” approach of Box and Jenkins attempts 
to have the data specify the model based on certain first- and second- 
order characteristics. However, after a state-space time-series model 
is known, one can find nearly equivalent representations of that model 
for either theory. 

_ The particular state-space formulation of Kalman has some desira- 
ble features. First, it lends itself to simple, recursive estimation of the 
parameters. That is, no data history need be stored. As new data or 
observations become available, they are processed and the stored state 
vectors are updated accordingly. In fact, this recursive calculation 
suggests a Markovian property of the filter: the current state vector 
summarizes all relevant historical information concerning the history 
of the time series. Second, there is a provision for separate character- 
ization of the two sources of significant estimation errors: the dynamics 
of the true process and relationship between the state variables and 
the measurements used to estimate these variables. Third, the model 
provides an analytic framework for studying relationships among the 
first- and second-order properties of the state variables and measure- 
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ments. For example, one can derive analytic expressions for forecast 
variances as functions of the number of data points processed and the 
autocorrelation matrix of the measurement errors. Finally, the model 
is general enough to include as special cases the common models: 
exponential smoothing, weighted least squares, multiple linear regres- 
sion, and Wiener filtering. 


2.2 Statistical properties 


A correctly specified linear Kalman filter produces forecasts that 
have minimum mean square error.* Moreover, the forecasts are un- 
biased and have minimum variance. When the errors are Gaussian, 
these properties hold without restriction to the class of linear models. 
The estimators can also be derived using maximum likelihood or Bayes 
models. When the models are only approximately correct, the gener- 
ality of the formulation allows one to analyze the filter’s suboptimal 
performance and, if desired, to adjust the filter’s parameters as appro- 
priate." 


il. DISCRETE-TIME LINEAR KALMAN FILTER MODEL 
3.1 The model 


It is assumed that the true process dynamics are described by the 
following linear transition equation 


Xn+1 = oXy, + U, + Wn, (1) 
where 
X, = an s-vector of state variables in period n, 
@ = ans Xs transition matrix that may, in general, depend on n, 
Wn = an s-vector of random modeling errors, i.e., random deviations 
of the true process from the assumed linear relation defined by 
o, and 


U, = an s-vector of deterministic changes in state. 
The one-step projection formula is given by 


Xai = OXnn -—- Un, (2) 


where, in general, X,+2n is an estimate of Xn+. (k = 0) given data 
Yi, °**, Yn In periods 1 through n, where y, is a d-vector of observed 
variables in period 7. 

The relations that distinguish the Kalman filter model and associ- 
ated computational procedures from other linear estimation tech- 
niques are the particular model relating X,, to y, and the algorithm for 


* These terms and others are defined in Section V. 
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computing X,,,. The d X s matrix H, which in general may depend on 
n, defines the relationship between y,, and X,, as 


yn = HX, + pn, (3) 


where pv, 1S a d-vector of measurement errors. At time n, the vector 
X,,n 1s computed by 


A 


Kin = Rui + Kil¥n a AX) n-1). (4) 


The s X d “Kalman gain” matrix K,, can be calculated recursively by 
the following equations: 


K, = P,H™(HP,H’? + RR)" 
S, = (I — KnH)Pn 
Pasi = Sid’ + Q, (5) 


where 

(t) R is the covariance matrix of the measurement errors, i.e., 
R = E(vnvn), 

(ut) @ is the covariance matrix of the modeling errors, i.e., Q = 
E(wnw;), and 

(tii) it is assumed that E(v,) = E(w,) = 0 for all n, E(w,v7) = 0 for 
all (n, 1), E(wnw; ) = 0 for all n ¥ i, and E(v,»;) = 0 for n ¥ i, and 

(tv) in general, @ and R may depend on n. 

Thus, in summary, the forecasting procedure has the following steps: 

1. When n = 0, the filter is initialized by a user-supplied estimate 
Xoo of the initial state vector Xo and So. 

2. Using these estimates, a one-period-ahead forecast is produced 
using eq. (2). 

3. The gain matrix, K,, and the matrices S, and Pn+: are calculated. 

4, When a new observation is received, eq. (4) is used to obtain a 
“smoothed” estimate X,,,, of the present state vector. 

5. Then using this new estimate, a one-period-ahead forecast is 
produced. The period index n is incremented and Step 3 is repeated. 

The algorithm continues to process new observations and produce 
forecasts in this manner. 

Several points should be noted: 

1. If the matrices H, R, ¢, and @ are independent of n, the gain 
matrix, K,, and the matrices S and P are independent of the observa- 
tions. Thus, these matrices can be precalculated for use in the algo- 
rithm. 

2. No past observations must be stored since all historical informa- 
tion is contained in the “smoothed” estimate X,,,, or, equivalently via 
eq. (2), the one-period-ahead projection Xn+1n. 
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3. Only the one-period-ahead forecast must be saved to be used in 
the next period’s process. 

When the assumptions listed above in (viz) are valid and the models 
accurately describe the true process dynamics and the measurement 
system, the Kalman filter produces unbiased estimates of X,+2; that 
1S, E(Xn+kn) = X42 for k = 0. In addition, the estimates > nee have 
minimum variance in the class of all unbiased estimators.”’ If w, and 
py, are Gaussian, then no restriction to the class of linear estimates is 
required and the estimators can also be derived from maximum like- 
lihood and Bayesian models.’ Conveniently, the Kalman formulation 
actually provides estimators for the estimation error variance matrices 
of interest: 


(smooth) 
Sn = E(Xnn — Xn)(Kan — Xn)”, (6) 
and, for k = 1, 
(predict) 
Pria = E(Xntan — Xnve)(Knten — Kase)”. (7) 


Therefore, the requirement that the user specify an initial estimate of 
So is a need for an estimate of the covariance matrix of the initial state 
vector ee 

Analyses of the sensitivity of the filter performance to model accu- 
racy and of the modified estimators of error covariance matrices for 
the case of suboptimal gains K’ is given in Ref. 13. 


3.2 Interpretation 


From eq. (4), we see that the vector Xnans the smoothed estimate of 
X,, is derived as the previous one-step projection X,,,-1, plus a linear 
combination (weighting) of the differences between the measurement 
yn and the previous estimate (forecast) of these measurements, 
HX&nn-1. The weights assigned to the difference terms are appropriate 
components of the gain matrix K,,. It is also important to note that 
Soo depends on Xjn—1, Yn, Kn, and H, but not explicitly on yi, ---, 
yrn-1. This recursive nature of the Kalman filter eliminates the need 
for storage of historical data. 

We can give some insight into the effect K, has on algorithm 
performance, without actually describing a specific model. It can be 
seen from eq. (5) that K, has terms which are directly proportional to 
the elements of the covariance matrix Q* and inversely proportional 


* As we indicate later in Section 3.4, this “proportionality” to Q is strongest for 
large n. 
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to the elements of A. That is, K, is, in a sense, proportional to the 
variability of the true process dynamics and inversely proportional to 
the measurement variability. Thus, it is the “Q/R” relationship which 
defines the responsiveness of the filter (via the K matrix) to estimated 
errors in state (y, — Fei) As the elements of K, decrease (by 
decreasing Q or increasing f&), the forecasts become more stable. That 
is, if we have more confidence in the unbiasedness of the process model 
or less confidence in our observations, the filter will be designed to 
respond more slowly to apparent deviations from the predicted trend 
line. As K,, increases (by increasing Q or decreasing R), forecasts detect 
and respond to data that deviate from the assumed deterministic linear 
model, represented by the expected value of eq. (1). 


3.3 An example 


Consider a simple linear two-state model (s = 2) , 


where 
_ | Xnt1 | _ 1 1 Xn On 


Yn = Xn + Vn. (9) 


and (d = 1) 


We further assume that R = 1 (since the gain matrix depends only on 
Q/R, we do this with no loss in generality). To complete the model, 
the 2 X 2 matrix Q must be specified. This simple two-state model is 
fundamental to the models presented in the three companion papers. 
In these models, x, is referred to as the level of the process, and xX,, as 
the growth increment. 

We now show how the various components of Q influence the 
response of the filter to a measurement y,. First, note that eq. (4) can 
be rewritten as 


A % 7 (n) mee: 
x= [i]- [Ether fe)] ao 


Gin List — RY? (Yn = Magica) 


Hence, the smoothing process is determined by the specification of 
two sequences of number k{}, ---, Ri, --- and R$}, ---, RSP, ---. 
As we indicated above, these gains tend to be proportional to the 
elements of Q. 

Fig. 2 shows the filter operation and the role of the gain sequence 
{k{?}. Initially, attention should be focused on the trend line at time 
n — 1 derived from n — 1 pieces of data, yi, «++ , yn-1 and note that 
Xn-1n-1 lies on that line. Further, x,,,-1 is a straight projection of this 
trend one-period ahead. When the new observation, yn, is obtained, 
the slope of the trend line is adjusted upward in the direction of yn 
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Fig. 2—Projection using the Kalman filter. 


relative to Xnn-1. The smoothed estimate Xnn of Xn iS Xnn—-1, plus a 
factor k{? times the increment yp, — £nn-1. The new trend line passes 
through x,,, and its extrapolation produces £n+1,n. 

The choice of Q (and R) determines the k$? and, hence, the 
response of the filter to observation y, relative to Xnn-1. Note that the 
smoothed estimate Xnn Of Xn, NOt yn, is projected into the future. Thus, 
we do not project all of the noise in y, into the future. This is a 
fundamental distinction between Kalman filter projections and most 
existing load-projection algorithms used by the Bell System. 


3.4 Estimating matrices 


If all Kalman filter model matrices (¢, H, So, Q, and R) are known, 
the algorithm is completely specified; moreover, the desirable statisti- 
cal properties of the filter are assured. However, in general, the true 
model is not known precisely, or it is likely that the model is to be 
applied to many different estimation problems. Hence, in practice, we 
often settle for less optimal (in a bias or variance sense), but more 
robust properties for our forecasts. That is, we choose to achieve a 
reasonable balance between bias and variability (stability) over a wide 
range of practical interest for key parameters while relaxing some of 
the assumptions in Section 3.1. 

Usually, it is the case that the processes being modeled suggest a 
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reasonable choice of matrices @ and H. (Examples are discussed in the 
companion papers in this issue.) Also, typically, the data characteristics 
can be analyzed so that & is known either analytically or empirically. 
Therefore, the main concern is the selection of matrices Sp and Q. The 
former strongly influences the transient characteristics of the filter; 
that is, K, is strongly dependent on “S)/R” for n small (<10). The 
latter affects the steady-state performance; that is, K, approaches 
“Q/R” for large n. 

These matrices Sp and Q can be determined analytically or empiri- 
cally to provide the desired responsiveness and statistical properties 
over many time series. The key to the analytical approach is the 
matrix P,+41 In eq. (5), which describes forecast variance as a function 
of model matrices, when the assumptions of Section 3.1 are correct. 
Empirical studies attempt to tune filter performance so that a robust 
balance of unbiasedness and stability is achieved over the test cases. 
The performance trade-offs are illustrated generically in Fig. 3 as a 
function of K, whose dependence on Sp and Q was stated previously. 
The studies described in the companion papers illustrate these two 
approaches to filter design. 

We have ignored the extensive literature (see, for example, Ref. 14) 
on model identification for Kalman filters, because, as for the Box- 
Jenkins technique, substantial data is required for each time series. 


3.5 Special cases 
The Kalman filter model includes as special cases many common 
estimation techniques. We indicate three such examples. 
3.5.1 Multiple regression 
The multiple regression approach to estimation assumes that a time 
series { y,} is well-approximated by 
yn = HEX +, (11) 


where H, is an s-vector of observations of the independent regression 
variables, X is an s-vector of (constant) regression weights, and pv, is an 
error term. The Kalman filter model is obtained by allowing the 
regression coefficients to depend on n and by adding the dynamics 
relation 


Xp+1 = ».& F (12) 


Clearly, the model includes the autoregressive case (H,, is composed 
of previous realizations of y,) and can be generalized to vectors yn. 


3.5.2 Exponential smoothing 


Exponential smoothing is a process whereby current estimates of 
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Fig. 3—Empirical performance trade-offs. 


state X,,, are composed of a weighted average of the previous estimate 
at state (X,-1) and the current measurement (y,). That is, 


Xn = Xn-1 + Ryn — Xn-1). (13) 


Note that the weight & is a constant. The term exponential smoothing 
was chosen because the previous measurements influence current 
estimates by weights that decrease exponentially with lag. 

Clearly, eq. (13) is of the form of eq. (4) and can be generalized to a 
Kalman filter by including eqs. (2) and (3), and the dependence of k 
on n. 


3.5.3 Wiener filtering 


Wiener filtering’ corresponds to the case of constant Kalman gain 
matrix, i.e., when K, = K for all n. If ¢, H, R, and Q are constant, then 
the steady-state error covariance matrices exist and a Kalman filter 
with gains calculated using these matrices is identical to a Wiener 
filter. 


IV. IMPLEMENTATION CONSIDERATIONS 


Experience has shown that the Kalman algorithm, as described in 
Sections II and III, performs well, as long as the model is reasonable 
and the data y, are consistent with the model assumptions. However, 
in practice, additional logic or considerations are necessary to improve 
the performance when outlier data are present or when certain types 
of nonstationarity are present and to improve the computational 
efficiency of the procedures. 

An overview of a typical Kalman filter implementation is shown in 
Fig. 4. We will describe the components of the Kalman filter system in 
the following subsections. 
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Fig. 4—Kalman filter implementation. 


4.1 Data smoothing and projection 


The Kalman filter algorithm, described in Section 3.1, comprises the 
smoothing and projection functions. 


4.2 Initiation, transient response, and gain sequences © 


In some Kalman filter applications, the transient period is very short 
relative to the total time the filter is in operation. However, in the 
applications described in Refs. 9 to 11, a series rarely exists for more 
than 10 years, with 3 to 5 years being typical. Therefore, the transient 
response of the filter is important and Xoo and Sp (and, hence, K,, for 
small n) must be carefully chosen to provide good performance during 
the transient period. 

Surprisingly, good statistical performance in both the transient and 
equilibrium states is often achieved with fixed (K,, = K for all n) or 
finite (K = K,, for n = n*) gain sequences. Moreover, significant 
computational efficiencies are obtained when the precomputed finite 
gains are stored for on-line use. The companion papers describe the 
success of fixed and finite gain sequences for the respective applica- 
tions. 


4.3 Outlier detection, deterministic events 


We define unusual data to be that representing either an unusually 
large (outlier) measurement error (v,) or an unforeseen deterministic 
event (U,,). The importance of these two types of errors is that the 
smoothing process (4) tends to adjust the previous forecast X,,,-1 in 
the direction of the new measurement y,, with the movement being 
proportional to the estimated error (yn — HX,n-1). Therefore, without 
additional logic of an adaptive nature, outliers would cause overreac- 
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tion to measurements; unforeseen deterministic events would be in- 
sufficiently accounted for because the full (rather than smoothed) 
impact should have been entered in eq. (2). 

However, in practice, one cannot always satisfy the two conflicting 
objectives—ignore bad measurements and react strongly to unforeseen 
deterministic events. Hence, either robust estimators must be em- 
ployed or logic must be provided to identify and distinguish the two 
cases and to not overreact to either. The balance is again between bias 
and stability: over-response reduces bias but degrades stability, and 
vice versa. 

Considerable discussion of this trade-off and the resulting filter 
design for one application is described in Ref. 9. 


V. MODEL EVALUATION 


In previous sections of this paper, we have alluded to various 
statistical criteria that may be used to evaluate the performance of the 
filter. Clearly, none is right or wrong unless it can be shown that some 
significant costs are directly and uniquely related to a particular 
criterion. We have never seen such justifications. However, it is usually 
the case that costs are related in some indirect fashion to various first- 
and second-order error statistics of the forecasts. The companion 
papers will each refer to some subset of the following statistics, possibly 
normalized to be stated as a percent: 


Bias 
Antin = E(Xntan — Xn) 
(In)stability (view-over-view variability) 
Sn+en = E(Xnthn — Xnsen)(Kntin — Xnsnn—a)” 
(Im)precision or variance 
Pritn = E(Xntan — E(Xntbn))(Xnsen — E(Xnsen))” 
Mean square error 


Mastkn = E(Xn+an = X,,) heer ory x,,)7 


Clearly, there exist some analytical relationships among these sta- 
tistics. We will not derive any here; however, it is of interest to point 
out that, if Rows and X, are uncorrelated (a sufficient condition is 
that Q = 0 or k = 0 for the linear model of Section III), then 


T 
Mn+kn = An+enA nt+kn + P jae 
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which, in the scalar case, reduces to 
Z 
Mn-+r,n = As a Peer . 


The rms error is commonly calculated as (Mn+z»)'”” for this last case. 


VI. STATUS AND CONCLUSIONS 


The Kalman filter model promises to provide forecasts, for use in 
Bell System network planning, that are both substantially improved 
in a Statistical sense relative to existing methods and computationally 
more efficient. The former claim is based on testing by analysis, 
computer simulation, and field study. The latter, because of the 
efficient recursive calculations of the filter, has been borne out through 
Bell Laboratories programs and limited field studies. The companion 
papers in this issue will elaborate on these conclusions and will describe 
plans to implement the algorithms in standard, mechanized network 
planning tools. 
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Forecasts of busy season trunk group traffic loads are required for 
planning the Bell System’s message network. Forecasting algorithms 
currently in use obtain estimates of future loads by multiplying the 
most recent measurement of busy season load by an aggregate growth 
factor. Because of statistical errors in measured loads and differences 
between individual trunk group and aggregate growth factors, the 
resulting forecasts can have large statistical errors. In this paper we 
extend earlier work to develop a new algorithm, called the sequential 
protection algorithm (sPA), based on a linear two-state Kalman filter, 
together with logic for detecting and responding to unusually large 
measurement errors or changes in trend. In typical applications of 
Kalman filtering, the statistics of system noises, measurement errors, 
and initial conditions are known and the filter parameters (Kalman 
gains) are selected accordingly. For our application, however, these 
statistics cannot be determined without error. Consequently, we de- 
velop a method for selecting robust filter parameters which provide 
improved performance, independent of system noises, measurement 
errors, and initial conditions. In particular, under the assumption of 
linear growth for 5-year intervals, the average rms 1-year forecast 
error of SPA is about 10 percent less than that of the existing algo- 
rithms. Field test results confirm the theoretical results presented 
here. Accordingly, specifications have been written for inclusion of 
SPA in the Bell System’s standard trunk forecasting systems. 


l. INTRODUCTION AND SUMMARY 


Forecasts of busy season (yearly peak) trunk group traffic-loads are 
required for the planning of the Bell System’s message network. These 
forecasts are used to design traffic networks which minimize the cost 
of the trunks required to satisfy anticipated customer demands. 
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The standard load forecasting algorithms currently in use in the Bell 
System obtain estimates of future loads by multiplying the most recent 
measurement of busy season load by an aggregate growth factor; for 
example, the average of the growth factors obtained by trending the 
total office loads at each end of the trunk group. Descriptions and 
comparisons of the various algorithms currently in use are given in 
Ref. 1. 

As explained in Ref. 1, these algorithms have two significant sources 
of error: (1) Because of the finite amount of data upon which measure- 
ments are based, measured loads can have large statistical errors; 
standard deviations fall in the range of about 5 to 40 percent depending 
upon load size and type of measurement system.’ (ii) Individual trunk 
group growth factors can differ from the aggregate growth factor; 
standard deviations of 6 percent have been observed. These forecast 
errors are significant since they lead to an increase in the reserve trunk 
capacity required to satisfy customer demands.’ 

To reduce forecast error and, hence, reserve trunking capacity, a 
new algorithm, called the sequential projection algorithm (spa), has 
been developed to forecast busy season traffic loads within the Bell 
System. The SPA is based on a linear two-state Kalman filter model, 
whose use in traffic forecasting was studied first by David and Pack,’ 
together with logic for detecting and responding to outlier measure- 
ments, i.e., unusually large measurement errors or changes in trend. 

As discussed in Ref. 1, David and Pack tested several Kalman filter 
models—some with as many as eight state variables and four data 
variables. In summary, for the planning interval of interest (1 to 5 
years ahead) none performed consistently or significantly better than 
the relatively simple two-state (traffic load and incremental growth), 
one-data variable (measured load) model. 

The two-state Kalman filter establishes a linear trend for individual 
traffic loads as follows: As illustrated in Fig. 1, the level, or smoothed 
base load, is a weighted average of the most recent measurement, or 
base load, and the previous 1-year forecast. Similarly, the smoothed 
growth increment is a weighted average of the measured and previously 
forecasted increments. (The measured increment is, by definition, the 
measured load minus the previous year’s smoothed base load.) 

The performance of the Kalman filter, as measured by mean square 
forecast error, depends upon the filter parameters, i.e., the gains an 
and B, in Fig. 1, the standard deviation of load measurement and 
growth estimation errors, and upon the assumed evolution of the true 
load. | 

Since spA will be used under a variety of possible operating condi- 
tions, the filter parameters could be tuned to provide optimal perform- 
ance, i.e., minimum mean square forecast error, for each application. 
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SMOOTHED LOAD = FORECASTED LOAD (1-@,) + MEASURED LOAD (ap) 


SMOOTHED INCREMENT = FORECASTED INCREMENT (1-2) + MEASURED 
INCREMENT (;,) 


SMOOTHED 
INCREMENT 


LOAD 


FORECASTED 
INCREMENT 


A MEASURED LOAD 
—* FORECASTED LOAD 
O SMOOTHED LOAD 





PREVIOUS BASE FUTURE FUTURE 
YEAR YEAR YEAR 1 YEAR 2 


Fig. 1—Basic operation of SPA. 


For example, in the Bell System we employ two types of load mea- 
surement systems; Bell Operating Companies obtain load estimates 
from a direct measurement of trunk group usage, while Long Lines 
derives estimates from point-to-point, e.g., end-office to end-office data 
provided by the Centralized Message Data System (cmps).” For trunk 
group data, the standard deviation of measurement error is in the 
range of about 5 to 10 percent, depending on load size. For point-to- 
point data, the range is about 10 to 40 percent. (See Fig. 2.) Accord- 
ingly, different parameters could be used for different measurement 
systems and for different load ranges. 


40 


30 


POINT-TO-POINT (CMDS) DATA 
20 


rms ERROR IN PERCENT 


TRUNK GROUP DATA 





0 5 10 15 20 25 
OFFERED LOAD.!IN ERLANGS 


Fig. 2—Sampling error vs. offered load. 
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Indeed, the initial development of spa’ was intended for use with 
trunk group data only, and the objective of the current study was to 
extend the original work to applications using point-to-point data. As 
suggested above, one possible solution would be to develop multiple 
versions of SPA. 

Instead, in this paper, we develop a single, robust SPA whose param- 
eters are selected to provide improved performance over the entire 
range of operating conditions, including the use of either trunk group 
or point-to-point data. | 

The use of robust parameters is important for two reasons: (i) A 
single spA for all applications should be simpler to implement and 
maintain than multiple versions. (iz) The actual values of the statistical 
parameters for each application cannot be determined without error, 
and our results show that erroneously assumed values can lead to a 
performance substantially worse than that of conventional projection 
methods. Of course, as with any robust technique, we pay a premium 
by receiving less than theoretically optimal performance for protection 
against the possibility of a performance worse than conventional 
methods. 

Section II of this paper provides a qualitative overview of the 
functions performed by spa that include procedures for detecting and 


CURRENT PREVIOUS FORECAST 
BASE LOAD (LEVEL, INCREMENT) 






ADJUST BASE LOAD 
(CONSISTENT ASSUMPTIONS) 
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(UPDATE LEVEL, INCREMENT) 
PROJECTION 


PROJECTED 
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| 
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I 
| 
1 
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Fig. 3—Sequential projection algorithm. 
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responding to outliers, as well as the smoothing and projection func- 
tions of the Kalman filter. Section III defines the mathematical model 
and procedures for selecting robust filter parameters; Section IV gives 
numerical results; Section V develops the outlier detection thresholds; 
and Section VI gives the conclusions. 


ll. OVERVIEW OF SPA 


As shown in Fig. 3, SPA is composed of five major components: 
algorithm initialization, base-load adjustments, outlier detection, 
smoothing, and projection. In the following paragraphs, we provide a 
qualitative description of the operations performed by each of these 
components. 


2.1 Initialization 


As indicated in Fig. 4, the smoothed base load in the first year of 
operation, and in certain other cases discussed in Section 2.3, is equal 
to the measured base load. The smoothed growth increment is calcu- 
lated by multiplying this base load by a growth factor obtained, for 
example, by trending the total office loads at each end of the trunk 


group.’ 
2.2 Base load adjustments 


In the second and subsequent years of operation, SPA updates both 
the level and incremental growth by comparing the most recent base 
load with the previous 1-year forecast of that same load. Since the 
base and forecasted loads must correspond to the same traffic routings, 


4A BASE (MEASURED) LOAD 
—@ FORECASTED LOAD 
O SMOOTHED-BASE LOAD 


TRUNK GROUP FIRST-ROUTE LOAD 





YEAR OF OPERATION 


Fig. 4—Initialization of SPA. 
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differences between the previous and current routings are accounted 
for by adding an adjustment term to the base load so that the adjusted 
base load agrees with previous routing assumptions. After performing 
the outlier detection and smoothing functions described below, the 
routing adjustment term is subtracted from the smoothed base load so 
that it agrees with the current routing assumptions. 

Furthermore, the previously forecasted load is adjusted to remove 
the impact of deterministic events, such as a proposed tariff change, 
that were predicted but did not occur. 


2.3 Outlier detection 


Under the assumption that the observed forecast error (i.e., the 
difference between the forecasted and measured loads) has a Gaussian 
distribution with zero mean, the linear Kalman filter upon which SPA 
is based provides the minimum attainable mean squared forecast 
error.’ In practice, however, the normal statistical errors (because of 
the finite measurement interval, day-to-day load variation, random 
variations in CMDS point-to-point sample size,” and growth errors) are 
occasionally contaminated by wiring errors, recording errors, or unex- 
pected changes in the trend of the true load. In such cases, when the 
observed forecast error deviates from a Gaussian distribution, the 
linear Kalman filter model can have a mean squared forecast error 
which is substantially greater than the minimum.* 

Our approach to this problem, which is based in part on the nonlinear 


A BASE LOAD 
——e FORECASTED LOAD 
——— OUTLIER THRESHOLD 
O OUTLIER 
O SMOOTHED-BASE LOAD 


TRUNK GROUP FIRST-ROUTE LOAD 
| 
| 


1 2 3 4 5 6 7 8 
YEAR OF OPERATION 


Fig. 5—Response of outliers (sign not repeated). 


20 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1982 


Kalman filter model described in Ref. 4, is the following: If the 
difference between the possibly adjusted base and forecasted loads 
exceeds present thresholds, the base load is declared to be an outlier. 
In response, sPA will adjust the base load or restart at the measured 
level, depending only upon whether an outlier of the same sign oc- 
curred in the previous year. 

If an outlier of the same sign did not occur in the previous year, SPA 
replaces the base load by the nearest threshold value as indicated by 
the vertical arrows in Fig. 5. Qualitatively, the underlying assumption 
here is that in most cases such an outlier signals an invalid or atypical 
measurement caused by, for example, a recording error and not a 
change in trend. Formally, this modification of the Kalman filter is 
equivalent to that proposed by Masreliez and Martin in Ref. 4. 

Alternatively, as indicated in Fig. 6, if an outlier of the same sign 
occurs in two consecutive years, SPA restarts at the measured level. 
The assumption here is that in most cases two consecutive outliers of 
the same sign signal a change in trend. In theory, additional improve- 
ment could be obtained by restarting at the previous outlier and then 
smoothing with the current measurement. However, such a procedure 
would be more complicated to implement, and our studies show that 
it would have negligible impact on performance. 

As indicated by Huber’s studies,’ and as supported by our field test 
results,° adequate protection against outliers is obtained when the 
thresholds are set anywhere in the range of about one or two times the 
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——— OUTLIER THRESHOLD 
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Fig. 6—Response to outliers (sign repeated). 
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rms observed forecast error. This result is important: we show that it 
allows us to use thresholds which are independent of the number of 
data points processed and the type of measurement system. Numerical 
values for these thresholds are provided in Section V. 


2.4 Smoothing 


The possibly adjusted base and forecasted loads are combined to 
produce a smoothed base load and a smoothed growth increment. As 
indicated in Fig. 1, the smoothed base load is a weighted average of 
the base and 1-year forecasted loads. Similarly, the smoothed growth 
increment is a weighted average of the measured and forecasted growth 
increments. Procedures for selecting the appropriate gains a, and £,, 
are described in Section IV. 


2.5 Projection 


As indicated in Fig. 1, the smoothed base load and growth increment 
are combined to establish a linear projection. That is, the projected 
load for the kth future year is obtained by adding k smoothed growth 
increments to the smoothed base load. In practice, the resulting trunk 
group load forecasts can be adjusted to include user supplied estimates 
of the impact of deterministic events (caused by, for example, proposed 
tariff changes) or to agree with other aggregate load forecasts. 


Hl. MATHEMATICAL MODEL 
3.1 General 


Although spPA is based on a linear two-state, i.e., trunk group load 
and incremental growth, Kalman filter, we considered more complex 
models with additional state variables, e.g., aggregations of other trunk 
group loads and growth factors. Therefore, for completeness, we first 
summarize the equations which define the general discrete-time linear 
Kalman filter. For a more complete discussion, see Ref. 7. 

The true time-behavior of the state variables is assumed to be 
defined by the linear transition equation 


Xn+1 = oXn + W, 2 Us (1) 


where X, is an s-vector of true state variables in period (year) n, ¢ is 
an s X s transition matrix which may depend on n, W, is an s-vector 
of zero-mean random modelling errors, and U, is an s-vector of 
deterministic changes in state. 

The one-period-ahead projection formula is 


Xn+in == A As + On; (2) 


where Xn+z,. denotes an estimate of X,+, based on data through period 
n. The “smoothed” estimate X,,, is calculated by 
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ii = bee Bi Kn(yn — AXnn-1), (3) 


where K, is a d X s “Kalman gain” matrix and y, is a d-vector of 
measurements in period n related to X, by 


Je AXn + Vi, (4) 


where H is ad X s matrix and V,, 1s a d-vector of zero-mean measure- 
ment errors. 

The optimal gain matrix K, 1s determined recursively by the equa- 
tions 


Kn = PnH?™(HP,»H? + R) (5) 
Sn = (I — KnH)Pn, (6) 

and 
Pri = $Snb7 + Q, (7) 


where R is the covariance matrix of measurement errors, 1e., R = 
E(V,V32), Q is the covariance matrix of modelling errors, and Sp is an 
estimate of the covariance matrix of the initial state estimate Xoo. 
Furthermore, it is assumed that V; and W,; have zero mean and 
are pairwise uncorrelated among themselves and with each other for 
all i, /. 

As discussed in Section I, we will be concerned with filter perform- 
ance for nonoptimal gains. In this case, the covariance matrix P+: of 
the projection error (Xn+i1n — Xn+:) is related to the covariance matrix 
S, of Xnn by eq. (7), but S, is related to P,, by 


S, = (I — KnH)P,UI — KnH)* + K,RKz, (8) 
which reduces to eq. (6) only when K,, is given by eq. (5). 


3.2 Two-state model 


As noted in Section I, we considered models with as many as s = 8 
state variables and d = 4 measurement variables. However, our studies’ 
showed that none performed consistently or significantly better than 
the simple two-state, one-data variable model defined by the equations 


| Xn41 | _ 1 1 Xn Wn 
wefeJef ele[s] 


Yn = Xn + Un, (10) 


and 


where x, and x, are, respectively, the true load and true incremental 
growth in year n, and y, is the measured load in year n. These 
equations correspond to eqs. (1) and (4) with 
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1 1 
o= iS i (11) 


H =[1, 0]. (12) 


and 


The more complex models were rejected since the additional vari- 
ables that we considered, e.g., aggregations of other trunk group loads 
and growth factors, are usually not highly correlated with trunk group 
load. Moreover, when the correlation is high the improvement in 
performance is minimal and, more importantly, when high correlation 
is incorrectly assumed, the penalties outweigh the expected benefits.’ 

To complete the description of our two-state model, note that the 
smoothing eq. (3) becomes 


ee - ed on hve + On Yn = Xnn—1) | (13) 


Xen Xe Bo Buln — Xn,n-1) 


where a, and f, are the Kalman gains in year n. The 1-year-ahead 


projection formula is | 
j nee Oe Fes eee 
AXn+1n —_ k sl . (14) 


Note: Our analysis of the two-state model will ignore the possibility of 
deterministic changes in state; that is, we assume U, = 0. Also, by 
combining eqs. (13) and (14) it follows that eq. (13) may be written in 
the form 


bral = E ee Qi Xigpa + Qn¥n (15) 
Rig (1 me Bi) Xana + Bil Yn = Xai) ; 

which emphasizes that x,,, is a weighted average of the forecasted and 
measured levels and, similarly, x,,, is a weighted average of the 
forecasted and measured increments. 

Finally, note that R = o”, where o is the standard deviation of load 
measurement error; the dependence of o on load size and type of 
measurement system is discussed in Section 4.2.2. In the following 
section, we define the spa initialization procedure and in Section 3.4 


we describe our procedure for selecting values for the filter gains a, 
and £, and the associated assumptions about Q. 


3.3 Initialization 


As explained in Section 2.1, the smoothed-base load and growth 
increment in the first year of operation are given by 


X00 = Yo (16) 
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and 
X00 = EY0, (17) 


where g is an aggregate growth factor. For example, if g = 0.1, the 
increment is 10 percent of the base load. 

The normalized covariance matrix So/o” = (s;;/0”) of the initial state 
estimate is given by 


$11(0) = E(x0 — yo)” 











3 3 =], (18) 
S22(0) _ E( gxo — &y)” 
o” o” 
_ El(g — &)x0 + 8x0 — yo)] 
Sana sememeaatee 
27422 
= > + 62, (19) 


where g is the true growth factor and og is the standard deviation of 
the estimate g, and 


$12(0) = E[ (xo — yo) ( gxo — £yYo)] 


2 


oO O 
= E{(x0 — yo) [Cg — £)xXo0 + E(X0 — yo) ]} 
a” 
= g. (20) 


3.4 Filter parameters 


As discussed in Section I, our objective is to select robust values for 
the gains a, and £,; that is, values which will provide improved 
performance over the range of operating conditions. Our approach to 
this problem starts with the following idealistic assumption, which will 
be relaxed in a later section: 

We first assume that the true load displays constant incremental 
growth. That is, we first assume that Q, the covariance matrix of 
modelling errors, is identically zero. Under this assumption, it follows 
from eqs. (5) to (7) that the optimal gains depend only on the ratio 
So/o*. In turn, for a given value of , it follows from eqs. (18) to (20) 
that Sp/o” is determined by the ratio 


2 
2 Og 


(0/x0)” 


In Section 4.2, we display the optimal gains and corresponding 
performance (as measured by mean square forecast error) for various 


(21) 
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known values of G. More importantly, we show that an erroneously 
assumed value for G can lead to a performance worse than that of the 
conventional projection method; equivalently, since the conventional 
method is used to initialize sPA, the mean square forecast error can 
increase with the number of data points processed. 

Next, in Section 4.2.3, we show that the gains corresponding to one 
particular value of G provide a performance that is nearly independent 
of the actual value of G. We call these gains the “robust gains for linear 
growth.” 

Finally, in Section 4.3, we relax the assumption of linear growth and 
show that certain constant gains, derived from the “robust gains for 
linear growth,” provide improved performance in the presence of 
system noise, as well as the ideal case where the trend remains 
constant. 


IV. NUMERICAL RESULTS 
4.1 Examples 


To help explain our results, we first consider several examples which 
show how the gains and the mean-square forecast error depend upon 
the ratio G. In these examples, we assume that the true load has 
constant incremental growth, and we assume two-point distributions 
for measurement and growth estimation errors. That is, the measured 
load is either high or low by equal amounts and with equal probability, 
and the initial estimate of incremental growth is either high or low by 
equal amounts and with equal probability. Moreover, in each example 
we assume that the aggregate growth factor § is zero. 


A MEASURED LOAD 
——» FORECASTED LOAD 
——— TRUE LOAD 


LOAD 





YEAR 


Fig. 7—Optimal filter: no measurement error (G = ©). 
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4.1.1 No measurement error (G = ~) 


In the example illustrated in Fig. 7, we assume no measurement 
error, i.e., o? = 0, but an error of plus or minus one unit in the initial 
increment so that G = oo. Thus, with this information, the true load 
lies along the dashed line marked A with probability 0.5 or along the 
dashed line B with probability 0.5. Accordingly, the initial mean square 
forecast error is 0.5(1)? + 0.5(—1)? = 1. 

As shown in Fig. 7, the possible measurements in year 1 are M; or 
M2 with equal probability. However, since there is no measurement 
error and since two points determine a straight line, it 1s clear that in 
either case the forecast error can be reduced to zero after processing 
the second measurement; examination of eq. (13) shows that the 
appropriate gains are a; = f) = 1. 


4.1.2 No growth error (G = O) 


At the other extreme, suppose there is no error in the initial growth 
estimate, i.e., o = 0, but an error of plus or minus one unit in the 
measured load so that G = 0. As shown in Fig. 8, the trend line is 
either A or B with equal probability; hence, the initial mean square 
forecast error is 1. The possible measurements in year 1 are MM, with 
probability 0.25 [since A and a high measurement occur with proba- 
bility (0.5)(0.5)], M2 with probability 0.5, and M3 with probability 0.25. 

Since M, and M3 correspond uniquely to A and B, respectively, the 
forecast error can be reduced to zero after processing either of these 
measurements; eq. (13) and Fig. 8 show that the appropriate gains are 
a, = 0.5 and Bi = 0. However, if M2 occurs, no new information is 
obtained. Consequently, after processing the second data point, the 
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Fig. 8—Optimal filter: no growth error (G = 0). 
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LOAD 


A MEASURED LOAD 
——* FORECASTED LOAD 
—-—-— TRUE LOAD 





YEAR 


Fig. 9—Optimal filter: equal measurement and growth errors (G = 1). 


mean square forecast error is 0.5(0)? + 0.5(1)? = 0.5. Thus, the rms 
forecast error is reduced by a factor of 1/V2. 


4.1.3 Equal measurement and growth errors (G = 1) 


For values of G between the above extremes, ie., 0 < G < »™, the 
reductions in forecast error might be expected to fall between those 
corresponding to the extreme values of G. But, this is not generally 
true. 

For example, consider the case illustrated in Fig. 9 in which G = 1.0. 
We assume uncorrelated errors of plus or minus one unit in the 
measured load and in the initial increment. Thus, as shown in Fig. 9, 
the possible trend lines are A, B, C, or D, each with equal probability. 
Consequently, the initial mean square forecast error is 0.25(2)? + 
0.5(0)? + 0.25(—2)? = 2. 

In year 1 the possible measurements are M, with probability 
[0.25(0.5) = 0.125], Mz with probability 0.375 (since Mz may correspond 
to A, B, or C), Ms with probability 0.375, and M, with probability 0.125. 

If M, occurs, the trend line must be A and the forecast error could 
be reduced to zero by setting a; = % and £2; = 3. Similarly, if M4 occurs 
the trend line must be D; again a; = % and f; = % are appropriate. 

If M2 occurs, the trend line is either A, B, or C with equal probability. 
Since B is halfway between A and C in year 2, it follows that the mean 
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square forecast error is minimized by forecasting the level of B in year 
2; this result is obtained with a, = % and ~, = %. Similarly, if M3 
occurs, the forecast with these gains is on C in year 2, and the forecast 
error is minimized. 

Thus, the optimal gains are a; = %, 6; = % and, after processing the 
second data point, the mean square forecast error is 0.5(0)? + 0.25(2)? 
+ 0.25(—2)? = 2, which is identical to the initial value. Thus, for this 
example there is no reduction in forecast error after processing the 
second data point. 


4.1.4 Gunknown 


The above examples assumed that G was known exactly. Suppose 
now, however, that the gains are chosen under the assumption that 
G = ©, 1e., a1 = £, = 1, but in fact G = 0. In this case, which is 
illustrated in Fig. 10, the mean square forecast error in year 2 is 0.25(3)? 
+ 0.25(1)? + 0.25(—1)”? + 0.25(—3)? = 5, which is five times the initial 
value. 

This result, although exaggerated, is important because it shows 
that spa could actually perform worse than the conventional forecast- 
ing procedure. Indeed, this fact, combined with the practical impossi- 
bility of estimating G without error, is the major consideration in our 
decision to use the robust version of SPA described in Sections 4.2.3 
and 4.3.2. 


LOAD 


A MEASURED LOAD 
——® FORECASTED LOAD 
——— TRUE LOAD 





YEAR 


Fig. 10—Nonoptimal filter: assumed G = ©, actual G = 0. 
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4,2 Linear growth 
4.2.1 G known 


The results of this section apply when the true load has constant 
incremental growth and when the ratio G, eq. (21), is known exactly. 

Figure 11 displays the normalized rms 1-year forecast error as a 
function of the number of data points processed after initialization. 
The normalization is with respect to the rms forecast error of the 
conventional projection method. Since this method is used to initialize 
SPA, the initial normalized rms error equals 1.0. Each curve corresponds 
to a different value of G and, accordingly, to a different set of gains 
a, and B,. For example, the gains corresponding to three different 
values of G are shown in Fig. 12. 

The results shown in Fig. 11 assume that the true load displays 
constant incremental growth; under this assumption, the forecast error 
approaches zero as n increases. In practice, however, unexpected 
changes in trend will occur and, consequently, as described in Section 
2.3, SPA will occasionally be reinitialized. Accordingly, average forecast 
error over the interval between reinitializations is a more meaningful 
figure of merit. In the following paragraphs, we assume an average 
interval of five years. We emphasize, however, that this assumption 
will underestimate the benefits of sPA under more stable conditions 
and overestimate the benefits under less stable conditions. 
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Fig. 11—Optimal filter: forecast error vs. year. 
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Fig. 12—Optimal gains vs. year. 


Thus, the curve labeled optimal in Fig. 13 displays the normalized 
5-year average forecast error—the average of the first five values in 
Fig. 11—as a function of the parameter G. If, for example, G = G2 and 
we choose the gains accordingly, the average rms error would be about 
20 percent less than that for the conventional forecasting method. 
Alternatively, if G = G,, we would choose a different set of gains and 
the average reduction would be approximately 15 percent. Note as 
suggested by the examples discussed above, that the reduction in 
forecast error is a convex function of G. 


4.2.2 Gunknown 


Suppose now that the actual value of G differs from the assumed 
value. For example, suppose we use the gains corresponding to G = Gn», 
but in fact G = G,. In this case, the curve labeled G = Gy in Fig. 13 
shows that the 5-year average rms error would be about 20 percent 
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FILTER 
GAINS 


ROBUST (G = G3) 


NORMALIZED 5-YEAR AVERAGE rms FORECAST ERROR 





G, Go G3 G, Gs 
TRUE VALUE OF G 


Fig. 13—Average rms forecast error for several filters. 


larger than its initial value, in agreement with the example discussed 
in Section 4.1.4. 

The results of Fig. 13 can also be interpreted as follows: Suppose 
that G = G4 is appropriate when trunk group loads are derived from 
trunk group data. Then G = G2 = G,/3 would be appropriate when 
loads are derived from point-to-point (cmMps) data since, from Fig. 2, 
the rms measurement error for CMDs data is approximately three times 
that for trunk group data. Figure 13 then shows that an SPA tuned for 
point-to-point data would perform poorly with trunk group data. 

The results shown in Fig. 13 might suggest that we should use 
different gains for different applications. We considered this approach 
but found it to have two practical problems. 

First, as shown in Fig. 2, measurement error depends not only upon 
the type of data but also upon load size. And, although we could allow 
the gains to be a function of load size and data type, multiple versions 
of sPA would be relatively more difficult to mplement and maintain. 

‘Second, and more important, in practice it will not be possible to 
determine the exact value of G. For, in general, actual loads will not 
display constant incremental growth, but may exhibit random fluctu- 
ations about a trend line. In this case, the model described in Section 
III is still appropriate if x, denotes the trend level, instead of the trye 
load. However, o” now consists of two components: one because of the 
difference between the measured and true loads and the other because 
of the difference between the true load and the trend line. Although 
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Ref. 2 quantifies the first component, we have no way a priori to assess 
the contribution of the second component. Thus, our assumed value 
for o* may differ from the actual value. 

Similarly, since we cannot determine a priori the difference between 
our estimate of the aggregate and individual trunk group load growth 
rates, our estimate of a3, eq. (19), will, in general, differ from the actual 
value. 

Consequently, the assumed value of G will in general also differ from 
the actual value. Accordingly, to guard against the consequences of an 
error in our estimate of G, we would like to employ an algorithm which 
performs well under a variety of possible operating conditions. 


4.2.3 Robust gains for linear growth 


Fortunately, there exists a robust set of gains which provides the 
same average performance independent of the true value of G. That is, 
as shown by the curve labeled “robust” in Fig. 13, if we use the gains 
corresponding to G = G; (see Fig. 12) then the 5-year average rms 
error will be about 10 percent less than that of the conventional 
projection method—independent of the actual value of G. 

Of course, as with any robust technique, Fig. 13 shows that we pay 
a premium by receiving less than the theoretically optimal perform- 
ance for protection against the possibility of a performance substan- 
tially worse than that of the conventional projection method. However, 
we used the data from our field test” to estimate, a posteriori, the value 
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Fig. 14—Robust filter (variable gains): rms forecast error vs. year. 
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of G. Remarkably, the observed value was G = G3. Thus, at least for 
this one case, our robust filter was in fact nearly optimal. 

Although the robust gains yield an average performance which is 
independent of G, Fig. 14 shows that the actual performance in each 
year is not independent of G. For example, if G = G; the rms error of 
SPA increases above that of the conventional method by about 10 
percent in year 2, but is decreased by about 35 percent in year 5. Thus, 
during the first couple of years after initialization, SPA may perform 
slightly worse than the conventional method for some trunk groups. 
Thereafter, SPA will perform better, provided that the trend remains 
constant. 


4.3 Robust, constant gains for SPA 

4.3.1 Response to unexpected changes in trend 

With the “robust gains for linear growth,” SPA is robust to measure- 
ment and initial growth estimation errors. For practical applications, 
however, it is equally important that sPpA also be made robust to 
deviations from linear growth. 
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Fig. 15—Optimal gains with modelling error. 
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“ 


That is, the results of Section 4.2.3 assume that the trend line 
displays constant incremental growth. Under this assumption, since 
the forecast error approaches zero, the gains a, and £, approach zero 
as n increases. Accordingly, as discussed in Section 2.3, SPA would 
eventually respond to unexpected changes in the level or slope of the 
trend only when their cumulative effect produced two consecutive 
outliers of the same sign. 


4.3.2 Constant gains 


At the expense of receiving less than the theoretically optimal 
performance in the ideal case where the trend remains constant, we 
can decrease SPA’s response time to unexpected changes in trend by 
not allowing the gains a, and £, to approach zero. As discussed in Ref. 
7, one approach to selecting limiting values for a, and £, is to add 
zero-mean, uncorrelated random variables (w, and w,) to the descrip- 
tion of the true load in eq. (9). That is, we assume that the covariance 
matrix @ of modelling errors is nonzero. These modelling error terms 
lead to gains a, and £, which approach nonzero constants that depend 
upon the mean square value of these error terms. For example, Fig. 15 
shows the optimal gains corresponding to G = Gz for several values of 
E(w?) and E(w3). 

Although theoretically appealing, the above approach leaves the 
practical problem of determining appropriate values for E(w7) and 
E (w3). Values could be gleaned from a large amount of historical data, 
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Fig. 16—Robust filter (constant gains): rms forecast error vs. year. 
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which we do not have, but there is no guarantee that they would be 
appropriate for the future. 

Therefore, instead of pursuing the above approach, we simply re- 
placed the variable gains corresponding to G = G3 in Fig. 12 by the 
average of the first few terms and made the following observations: 
First, as shown in Fig. 16, the performance in the ideal case where the 
trend remains constant is nearly identical for the first six years after 
initialization to that for the robust gains for linear growth. Thereafter, 
the performance is somewhat poorer, but we anticipate that only a 
small fraction of groups will remain on a constant linear trend beyond 
6 years. Second, as illustrated in Fig. 17, when the true load displays 
random deviations from a linear trend, 1.e., when w,, W, are nonzero, 
the constant gains perform better than those designed for linear 
growth. Thus, although the performance is somewhat degraded in the 
ideal case, the use of constant gains provides protection against the 
very real possibility of unexpected changes in trend. For comparison, 
Fig. 17 also displays the performance with gains obtained by truncating 
the robust gains for linear growth in year 6. As indicated, performance 
is somewhat better in the ideal case, but worse when modelling errors 
are present. Since we do not have sufficient historical data to estimate 
the level of modelling errors, we are, therefore, recommending the use 
of the constant gains. 
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Fig. 17—Forecast error with modelling error. 
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Fig. 18—Root-mean-square observed forecast error vs. load: outlier thresholds. 


V. OUTLIER THRESHOLDS 


As discussed in Section 2.3, Huber’s studies’ show that the outlier 
thresholds may be set anywhere in the range of one to two times the 
rms observed forecast error. In theory, the observed forecast error 
depends upon the number of data points processed after initialization; 
however, since the average rms forecast error decreases by only about 
10 percent and since performance 1s not sensitive to the precise setting 
of the thresholds, we will set them based upon the initial value of the 
rms observed forecast error; that is, by | 

p” = E (x10 - yi). (22) 

Since X10 = (1 + Z)yo and x; = (1 + g)xo, it follows that 

0” = E[ (yo — Xo) + (yo — Xo) B + X0( 8 — g) — (yi — KD) 
= 20°(1 + + 8"/2) + soz, (23) 
or, since |g| <1, 
p” = xéoz + 20”. (24) 


To establish numerical values for o, we use the theoretical expres- 
sions for mean square measurement error given in Ref. 2: 


1 2xoh 
2 = — 
o a Fp + v2), (25) 


where h is the mean call-holding-time in hours, p is the sampling rate 
(for trunk group data, p = 1.0; for cmps data, p = 0.05), and 


Va = max(0, 0.13x§ — 2xoh) (26) 
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is the variance of the daily source loads. For first-routed loads, @ = 1.5 
is usually appropriate. 

Figure 18 displays p as a function of the load xo for both trunk group 
data (p = 1.0) and cmps data (p = 0.05). In each case, we assume 
6, = 0.06, which is the value observed during our field test,° and h = 
Yo. 

Based upon the results of Fig. 18 and Huber’s studies,” it follows 
that outlier thresholds set at two times the value of p for trunk group 
data should provide adequate performance for both trunk group and 
cMpDs data. That is, this setting places the thresholds within the 
allowed range of about one to two times the actual value of p for both 
trunk group and cmMDs data. 


Vi. CONCLUSIONS 


The spA employs a linear two-state Kalman filter, together with 
logic for detecting and responding to outlier measurements. The pa- 
rameters of sPA have been selected to provide improved performance 
over the range of operating conditions, including the use of either 
trunk group or point-to-point traffic measurements. 

Under the assumption of linear growth for 5-year intervals, the 
average rms 1-year forecast of SPA is about 10 percent less than that 
of the forecasting methods currently in use in the Bell System. More- 
over, the filter parameters and outlier procedures have been designed 
so that sPA will respond to changes in trend. 

Field test results confirm the theoretical results presented here. 
Accordingly, specifications have been written for the inclusion of SPA 
in the Bell System’s standard mechanized trunk forecasting systems. 
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A Sequential Projection Algorithm for Special- 
Services Demand 


By A. IONESCU-GRAFF 


(Manuscript received December 31, 1980) 


Performance analysis of the four time-independent regression 
models presently used by Bell operating companies to forecast spe- 
cial-services circuit requirements, and the characteristics of actual 
special-services demand history observed from three operating com- 
panies, indicate a need for a new method to forecast these difficult 
time series. A new special-services demand sequential projection 
algorithm (ssD-SPA) is developed based on a linear Kalman filter 
model. It includes methods to detect previous deterministic events, to 
accept and process exogenous information affecting the demand, and 
to recognize and adapt to a “no-growth” situation. Compared to the 
present algorithm, SSD-SPA generates significantly better forecasts: 
approximately 30 percent improvement in forecast accuracy and 
stability, 25 percent reduction in rms error, and 22 percent reduction 
in circuit misplacements. 


l. INTRODUCTION AND SUMMARY 


In recent years, the demand for special-services circuits has grown 
at more than twice the annual rate of the demand for message 
telephone service (9 percent versus 4 percent). This rapid growth, the 
development of new technologies, and the problems in the existing 
special-service provisioning process have led to a reexamination of the 
overall process of special-services planning and provisioning. 

Key inputs to this process are special-services demand forecasts; 
they are required for the marketing, budgeting, and engineering func- 
tions. Presently, in most Bell operating companies (BOCs), the short- 
range forecast (1 to 5 years) of point-to-point demands for interoffice 
special-service circuits 1s provided by forecasting systems based on 
time-independent trending models or by applying a user specified 
growth factor to the most recent demand. 

Previous studies of the special-service circuits life-time distribution 
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found no single, common distribution that reasonably fit the observed 
data. The time series consist mainly of small integers with demand 
levels ranging from very volatile to perfectly constant, and displaying 
numerous “jumps,” probably the effects of deterministic events. These 
data characteristics explain the inadequacy of the present time-inde- 
pendent (unweighted) regression models used to fit the past data: 
linear, exponential, and first- and second-order autoregressive. 

Consequently, a new algorithm—the special services demand se- 
quential projection algorithm (SSD-SPA)—is proposed, based on a dy- 
namic time-series model with deterministic event input, the Kalman 
filtering technique for state-vector estimation and prediction, and an 
additional procedure to process outliers. The attributes and specific 
parameters of this model are derived from the demand history for 
special services from three BOcs. 

Section II gives background information on the study. It describes 
the data available for analyses, summarizes the main characteristics of 
the demand time series to be forecasted, and presents the measures to 
be used in the empirical investigation of the algorithms’ performances. 
A brief overview of the existing forecasting models, and results of the 
forecasting algorithm performance analysis follow. A list of the desir- 
able features of a new special-services demand projection algorithm 
are derived from these results and the characteristics of the actual 
demand history mentioned in Section 2.1. 

In Section III, a linear Kalman filter model is formulated, and the 
choice of specific parameters is studied. Implementation considerations 
include initialization, outlier detection, deterministic event (level or 
growth) detection and processing, as well as filter gain selection. 
Special-services demand sequential projection algorithm forecasts are 
then tested and compared to the present forecasting algorithm. Results 
include the comparative forecast qualities for the case of small integer 
projection and an estimated economic impact of the new algorithm. 
Finally, conclusions and recommendations are summarized in Section 
IV. 

ll. BACKGROUND 

Evaluation and comparison of various demand projection algorithms 
require a description of the characteristics of the time series to be 
projected, so that the appropriateness of the model can be determined; 
also, a definition of the performance statistics used for algorithm 
comparison is needed, so that the best feasible model can be selected. 


2.1 Special-services demand 
2.1.1 Data for analyses 


The term special services refers to all Bell System services other 
than ordinary message telephone service. Examples of special services 
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are foreign exchange, tie lines, off-premise extensions, and private 
lines. The classification of special-service circuits varies from one BOC 
to another, and for a single Boc over time. For example, one BOC 
recognizes about 500 different circuit types, while another recognizes 
only 150. 

Two types of special-services history files were available from three 
major BOCs to support our studies: detailed-demand history files 
(DDHF) and grouped-demand history files (GDHF). The maximum num- 
ber of available months varied among the three Bocs: 60 for Company 
A, 67 for Company B, and 71 for Company C. The maximum could not 
exceed 71 months since this is the maximum that can be stored and 
processed by the present forecasting system. 

The DDHF contains individual records of the number of special- 
service circuits of a given Boc class of service between a pair of central 
offices (cos). The large number of possible point-to-point individual 
special-service type circuit records on the DDHF (for example, 500 types 
for each pair of COs times all possible combinations of co pairs) and 
the small size of these groups (more than 90 percent have only one 
circuit) makes any attempt to forecast each time series impractical. 
Consequently, in the design of the present forecasting system (the 
special-services forecasting system, or SSFS), the decision was made to 
group the individual records before projection. 

This grouping of DDHF records, according to a user specified grouping 
strategy, results in a GDHF. The resultant grouped special-service time 
series are the basic input to the forecasting routines and represent the 
numbers of circuits of one or more types between a given pair of 
offices. 

For the special-services demand analysis, both types of files were 
used. For the present forecasting algorithm performance study, only 
the GpuFs from Companies B and C could be used since only they had 
the format required by the input routine. These two files were also 
used for the SSD-SPA performance tests. 

Both tapes were created using grouping strategies specified by the 
facility and equipment planners: 14 grouping types in Company B and 
19 types in Company C. The file from Company B covers the time 
period between January, 1973 and July, 1978 and contains 20,036 such 
grouped records. The file from Company C extends over the period 
January, 1973 to November, 1978 and contains 41,073 records. 


2.1.2 Demand characteristics 


The special-services demand analysis identified the following signif- 
icant characteristics: | | 

(z) Very skewed circuit group size distribution, regardless of the 

grouping strategy. More than 80 percent of the point-to-point groups 
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consist of less than 10 circuits. Fig. 1 plots the maximum number of 
circuits in service over the history for each circuit group against the 
frequency of that particular size. The histograms for the three Bocs 
are remarkably similar, even though the grouping strategies used were 
different. The skewness of the size distribution would be accentuated 
if we had plotted the group sizes at a given point in time, instead of 
the maximum size over the whole history. In the same time, the long 
tail of the distribution shows that, although most of the point-to-points 
are very small in size, the majority of the special-service circuits are 
placed in a few very large groups. For example, in Company C only 6.5 
percent of the groups consist of more than 50 circuits, but these groups 
are extremely large and account for more than 75 percent of all special 
circuits in service. 

(tz) No seasonal pattern. 

(iiz) High volatility of the time series, even at high levels of aggre- 
gation. 
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Fig. 1—Maximum demand per circuit forecasting group. 
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(tv) Jumps in the demand level. This is a frequent phenomenon; 
circuit groups remain at one size for an extended period of time, then 
jump to another value, and remain at this second level for some time. 
This stepwise change in the demand level is probably caused by 
deterministic events, such as large customers moving in or out, routing 
changes, tariff changes, or market stimulation. 

(v) Constant level circuit groups. Approximately 40 percent of each 
company’s grouped point-to-point records showed no change in the 
demand level over the period of time that data were available. 

(vt) Vanishing circuit groups. About 30 percent of the groups have 
all their circuits eventually disconnected (i.e., the demand for these 
groups goes to zero) with practically no regeneration during the period. 


2.2 Algorithm performance measures 


Previous studies’” indicated that good performance measures for 
algorithm comparisons are accuracy (average forecast error), rms error 
(square root of the mean squared error), and stability (the variability 
of consecutive views of the same future period). For the present 
analysis, a fourth forecast attribute, misplacement, is defined (total 
positive or negative forecast error). 

To quantitatively measure and compare the performance of both 
SSFS and SSD-SPA forecasting procedures, we used both algorithms to 
generate demand forecasts for each circuit group (from the same data 
base), and then compared the results using relative forecast error 
statistics. 

Let: 


Yn+k = the recorded number of special-service circuits at time n + k 
Xn+kn = the forecasted demand at time n + &, given data through time 
n; 1.e., a k-period forecast. 


~ 


Then the relative accuracy, An+z,n, of the k-period forecast from period 
n is defined to be 


“ ia n~ yn 
| ee po See (1 
Yn+k 
The relative rms error, Risen is defined to be the square root of 
A 2 
Blinn = [es | (2) 
Yn+k 


The relative stability, S,+:,, of consecutive forecasts from periods 
n — 1 and n for a fixed target date n + R is defined by: 


A A 2 
A Xnt+kn —~ Xn+k,n-1 
2 7 , n ’ 
Si+kn = | —————————— | . (3) 
Yn+k 
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Treatment of yn+z = 0 1s discussed later. Accuracy and stability are 
actually measuring inaccuracy and instability. Consequently, a de- 
crease in either measure is equivalent to an improvement. All three 
performance measurements are empirical estimates of the normalized 
statistics (accuracy, rms error, and stability) as described in Ref. 2. 
Relative statistics, as opposed to absolute statistics, were used so that 
a small absolute error on a large group would not obscure large 
absolute errors on many other small groups. Network estimates of 
accuracy, stability, and rms error are produced by averaging individual 
estimates over all groups. 
We define total error (TE) to be 


Total forecast — total demand 


ia Total demand 


Note that TE can be almost zero as a result of error cancellations; 
therefore, misplacement is a better measure of the total number of 
circuits erroneously forecasted. Misplacements translate directly into 
inefficient capital expenditures. 

The positive misplacement, M+.2n, of the total number of circuits 
forecasted from period n for the target period n + k is defined by: 
N 
», di 
i=1 


“a 


+ 
Marten = 





|): (4) 
Ye 

i=] 

where i = 1, 2, ---, Nis the index over all circuit groups in the network 


and 
Ben — Ye if Fen zy re 


x 
= 0 otherwise. 


Similarly, the negative misplacement, M7:,,, of the total number of 
circuits forecasted from period n for the target period n + k is defined 
by: 
N 
» di 
A i=1 


nt+khyn = 





= r) 
»> yor 
i=] 


where 1 = 1, 2, ---, Nis the index over all circuit groups in the network 
and 
d:=y2,—-20 nn Wf £82.< 992 


= 0 otherwise. 
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We call Mi+en a measure of total network overprovisioning and the 
negative misplacement, Mnizn, a measure of total network under- 
provisioning. Positive misplacement may translate into underutiliza- 
tion, while negative misplacement may translate into orders lost or 
held, or misroutings. 

Note that TE and misplacements are related as 


TE = Misrnn — Miser. 


2.3 Present forecasting algorithm 


This section presents a brief overview of the projection algorithm 
presently used in ssFs, the performance testing procedure, and its 
results. 


2.3.1 Overview 


The present forecasting algorithm produces point-to-point demand 
forecasts of interoffice special-services circuits for the current year and 
for each of the next 5 years. 

The forecast 1s generated in two major steps—the preliminary 
forecast and the final forecast. The preliminary forecast employs one 
of four statistical models or user-stated growth factors to predict future 
circuit requirements. The four regression models are linear, exponen- 
tial, and first- and second-order autoregressive. They are used only 
when the group has sufficient demand history; at least 12 months of 
history are always required, and the default value is 24 months. Before 
forecasting, the available history is smoothed using a 3-month moving 
average. 

The parameters for each model are determined by minimizing an 
unweighted sum of squared errors over the smoothed data. The model 
with the smallest sum of squared errors or, equivalently, the model 
with the highest R® statistic (the coefficient of determination of 
“soodness of fit”), is selected. However, the exponential model is 
rejected if any of the history is zero or if it would lead to a prediction 
of explosive growth, and the autoregressive models are rejected, unless 
the demand time series is sufficiently stationary. 

Finally, if the model chosen was linear or exponential and the 
current demand has shifted significantly from the historical growth 
trend, then the forecast is also shifted to coincide with the current 
demand. A significant shift is defined relative to the estimated standard 
error of the unsmoothed demand history (excluding the current de- 
mand) from the trend line. Since at each forecast view all history is 
reprocessed to recalculate the regression parameters, treatment of 
such discontinuities may be inconsistent from one forecast view to 
another. The sensitivity of this test may be adjusted by the user; the 
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default value is two standard deviations. No adjustment of this kind is 
considered for the autoregressive models. 

When the forecast groups do not have the required number of 
months of history, forecasts are produced applying default growth 
factors to the forecast group’s current demand. 

The forecaster reviews the preliminary forecast and makes manual 
adjustments when appropriate. An example is when advance knowl- 
edge is available on new businesses moving into an area or new Services 
are being offered. 

The following section describes the results of our study to quantify 
the present forecasting algorithm performance. This analysis only 
covers the preliminary forecast. The impact of manual adjustments 
was not studied, since no records were available. The main deficiencies 
of the existing forecasting technique are summarized and explained in 
View of the demand time series characteristics. 


2.3.2 Performance analysis 


The algorithm performance is specified in terms of statistical attri- 
butes (accuracy, rms error, stability, misplacements); the analysis 
sought to verify if there is indeed a benefit in having four different 
models to choose from, to identify the main forecasting problem, and, 
based on the demand time series characteristics, to derive require- 
ments for a new forecasting algorithm. . 

A modified version of SSFS was used to produce up to three consec- 
utive forecasts for each point-to-point demand, depending on the 
length of each demand history available. To ensure compatibility with 
other planning tools, SSFs is required to produce quarterly average 
forecasts of the future demand for special services. The data files 
available extend up to 71 months, and since SSsFs requires at least 12 
months of history for the forecast initialization, the longest forecast 
that can be produced and checked against actuals is 18 to 19 quarters, 
1e., about 4 years. For simplicity, instead of estimating 18 to 19 values 
of A, R, §, tz, and M, we only looked at one quarter in each year (the 
same quarter each year, right justified by the last quarter of available 
data). Consequently, for those records with at least 60 months of 
history, three forecasts were provided, as shown in Fig. 2 (1 year 
initialization plus 4-year-span forecast, then 2 years initialization and 
3-year-span forecast, and 3 years initialization and 2-year-span fore- 
cast). Only two forecasts were produced for records with 48 to 59 
months of history (3- and 2-year-span forecasts), and only one forecast 
for records with 36 to 47 months of history. 

‘Since each forecast is made after at least 12 months of data are 
processed, only a steady-state analysis is necessary. Thus, the sub- 
scripts n for Ansin, Rn+an, and Snizn are dropped. For each circuit 
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Fig. 2—Algorithms performance analysis test plan. 


group, accuracy, rms error, and stability are estimated using all fore- 
casts produced. For example, the accuracy of a 3-year-ahead forecast 
for a circuit group with 60 months of history available is estimated by: 


Ay a 1 X77,74 — 77 A X78,75 — 78 
2 Yi77 78 


where, if g is the last quarter of available data in the last year of 


history, then 


%;,;= forecast of the average demand in quarter q; year i, made from 
quarter q; year /. 
¥; = average measurements of the demand in quarter q; year i. 


Or, stability of 3- versus 4-year-ahead forecasts for the same group is 
estimated by: 


aA aA 
X78,74 — X78,75 


78 


S3 A 








For groups where y,+; = 0, a normalization factor of 1 is used. This 
may bias the statistics (to look worse than they actually are), but since 
the objective is to compare the performance of different algorithms, 
and they all use this rule, we may expect this normalization to affect 
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them equally. In fact, the performance results measured using this 
normalization were similar to those obtained with nonvanishing groups 
only. Three consecutive forecasts were produced for about 56 percent 
of the groups (those groups with 60 to 67 months of history*), two 
forecasts for about 10 percent (48 to 59 months of history), and only 
one forecast for 9 percent (36 to 47 months of history). The remaining 
25 percent of the groups were not used, since their recorded histories 
were shorter than 36 months. 

2.3.2.1 Statistical performance. The results showed that the demand 
forecasts are often inaccurate and unstable. The numerical results can 
be deduced from the values presented in Section 3.3 on the SSD-SPA 
performance, and relative improvement versus the present SSFS. 

The accuracy histogram showed about 40 percent of the groups had 
1-year forecasts with no error. This was to be expected since about 40 
percent of the point-to-point groups have constant demand. Addition- 
ally, the existing forecasting algorithms more often overforecasted 
than underforecasted. | 

The significant instability observed for consecutive forecasts was 
due not only to the intrinsic volatility of the demand time series, but 
also to the change in forecasting models used each year. 

Small total errors resulted from cancellations of up to 50 percent 
misplaced circuits (large total misplacement). It was interesting to 
observe that although accuracy was always positive, many times 
(especially for Company C) the total error was negative. This means 
that even if on the average most of the circuit groups are overfore- 
casted, some of the very large point-to-point groups are underfore- 
casted so that the total forecast over the whole company is less than 
the realized demand. 

2.3.2.2 Correlation between forecasting model fit (R?) and projection 
error (accuracy andrms). As previously described, the existing algorithm 
chooses from the four regression models the one with the highest 
coefficient of determination (R’, or “goodness of fit”). The intuitive 
reason for this is that the curve that best fits the past data should 
extrapolate most accurately into the future. If indeed, there is a benefit 
in having four different models from which to choose, then we would 
expect to find some correlation between how well the chosen models 
fit the data (R’) and the forecast quality. Subsequent testing, designed 
to consider all combinations of models and forecasting spans, showed 
that the choice of four projection models appeared unjustified since 
the correlation between the goodness of the model fit to the history 


* The number of months of history refers to how long ago the first circuits were 
installed on that group, not to the actual length of time the demand was nonzero. About 
30 percent of the groups with more than 36 months of recorded history vanished during 
that period (demand had zero value eventually). 
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data (R’) and the forecast errors (accuracy or rms) was statistically 
insignificant. In other words, even perfect knowledge of the past does 
not necessarily imply good knowledge of the future. 

2.3.2.3 Outlier detection procedure. Many of the demand time series 
display a stepwise, highly volatile growth pattern, with the jumps 
probably generated by deterministic events. The existing shift option 
reacts to a significant difference between the actual demand and the 
forecasted value only if it happens in the last month of history. Any 
other jumps in previous months are treated as normal trend. Moreover, 
the error monitoring capability, which can detect large forecast devia- 
tions from the actual demand, is exterior to the main forecasting 
process. Consequently, the next projection cannot be improved based 
on the detected past errors. Figs. 3a and 3b give examples when the 
wrong model or parameters were selected for projection because of 
improper treatment of past special events. 

Another deficiency is the rather slow response to changes in trend; 
the equal weight assigned to each history point prevents the system 
from adjusting itself quickly to recent changes. 

2.3.2.4 Manual adjustments. The present forecasting algorithm can- 
not accept and process exogenous information. The forecaster has to 
review manually the forecasts and supply any modifications based on 
up-to-date knowledge. For example, about 70 percent of the forecasts 
in Company C are manually adjusted. 

2.3.2.5 History requirements. The special-services forecasting system 
needs at least 12 months (usually 24 months) of history to produce a 
forecast based on one of the four statistical models. If less data are 
available, growth factors are used (default or manually input values); 
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Fig. 3—Circuit groups with deterministic “jumps.” ssFs forecasting problem: special 
event treated as normal growth. (a) Example 1; (b) Example 2. 


SEQUENTIAL PROJECTION ALGORITHM 49 


15.2 percent of the Company B data base and 17 percent of the 
Company C data base consist of circuit group records with less than 24 
months of history. 


2.3.3 New algorithm requirements 


The demand time series characteristics and the results from the 
present algorithm’s performance suggest some desirable properties of 
a new algorithm: 

(t) Unequal weighting of data. Weigh the most recent data more 
heavily to allow the forecasting algorithm to adapt to dynamic changes 
in the demand pattern. 

(tz) Acceptance of exogenous information. Point-to-point demand 
levels are significantly affected by special events, such as large cus- 
tomers moving in or out, tariff changes, or by market stimulation. 
Many of these events are known in advance and their impact on the 
individual time series can be estimated. The forecasting system should 
accept those estimates and use them in projecting future demand 
levels. 

(viz) Shorter initialization period. The special-services segment of 
the total Bell System network is constantly changing and growing. 
New technologies, services, and rates are changing the customer de- 
mand patterns. Many special-service circuit groups are eventually 
disconnected (on an individual basis, 50 percent of the special-services 
circuits had a lifetime of less than 36 months; and 30 percent of the 
total number of groups “died” over a 5-year period); other groups 
come into service. Thus, a forecasting system must produce accurate 
forecasts based on small amounts of historical data, e.g., less than 12 
months. 

(iv) Recognition of past deterministic events (step changes and 
constant levels). The system should be able to recognize and react to 
“significant” changes in the demand level. Significant has to be defined 
as a function of the observed demand time series characteristics. 

(v) Forecast of small integers. About 80 percent of the special- 
services circuit groups have less than 10 circuits in service. Whatever 
the model selected for projection, it should produce stable and accurate 
forecasts of integers from 1 to 10. 

(vi) Computational efficiency. Users find it useful to run SSFS on a 
monthly basis. 


lil. SPECIAL-SERVICES DEMAND SEQUENTIAL PROJECTION 
ALGORITHM 


A linear dynamic system with linear growth and deterministic input 
is shown to be a reasonable and robust approximation for the special- 
services demand time series, and a simple (two-dimensional) linear 
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Kalman filter is selected as a method to estimate the state variables of 
this system. Filter parameter selection is examined and procedures to 
detect and respond appropriately to outliers are added to capture the 
stepwise growth pattern of the demand time series. Using data from 
Companies B and C, we test the performance of this new algorithm 
and compare it with the present algorithm. 


3.1 Linear Kalman filter model 
3.1.1 Model formulation 


In a linear dynamic model, as discussed in Reference 2, the behavior 
of the discrete time series is determined by an s-dimensional state- 
vector process {X,,}. The following two equations describe the time 
evolution of the process {X,} and the relation between X, and the 
corresponding observation yi: 


System equation: Xn+i = @nXn + Uns + Wns (5) 
Observation equation: y,=A,X,+ », (6) 


where @, 1S an $s X s state transition matrix, w, is an s-dimensional 
modeling error vector, U, is an s-dimensional deterministic input, Hn 
is ad X s observation matrix, and »p,, is a s-dimensional measurement 
noise vector. Furthermore, 


E(vn) = Elwn) = 0 


7 _|O0 if n¥J 
Blowal) = 10, if n=] Q@, an s X s matrix 


7 JO if n¥j 
Bow) ={ if n=j $R,and Xd matrix 


E(v,w?) =0 for all n,/. 


The s X s matrix, Q,, is known as the modeling error covariance 
matrix and R,, is the measurment error covariance matrix. 

In our demand analysis, it was demonstrated that no single common 
demand pattern exists for special services, but that for the majority of 
groups a linear model fit the historical data best. Furthermore, earlier 
work using Kalman filters for forecasting message trunk group loads,” 
showed that for short-term forecasting applications a linear Kalman 
filter model performs well even for nonlinear processes such as an 
exponential.* Consequently, we chose to develop a linear model that 
accounts for the special-demand characteristics discussed earlier. 


* Reference 1, for example, analyzed the performance of different Kalman filter 
models (linear, log-linear, etc.) to forecast busy season trunk group loads. It showed 
that, given measurement and modeling errors and errors in the initial state estimates, 
the linear Kalman filter would produce short-term (1 to 5 years) forecasts as good as 
any other model with respect to accuracy, rms, and stability measures. 


SEQUENTIAL PROJECTION ALGORITHM = 51 


Given the univariate measurement time series (i.e., d = 1) with 
linear growth, the special-services demand model can be represented 
by a two-dimensional linear model with the following parameters: 


es oe mn , 
m=6=(4 H, = (1, 0); (7) 


X, = (22), (8) 


where x} represents the demand level at time n, and x;, the incremental 
growth. 


3.1.2 Kalman filter (filtering and prediction) © 


The Kalman filter is a recursive method that produces a minimum 
variance unbiased estimate of the state vector {X,,} of a dynamic linear 
system from noisy observations yi, --+, yn and uses these estimates to 
predict future state values. 

Let > ene be the estimate of the state vector X, based on information 
available through time n — 1. Let 


Pp = E[(Xn ~ Xnn-1)(Xn — Xnn-1)7] 
be the one-step prediction error covariance matrix and 
S, = E[(Kn — Xan)(Kn — Xnz)7] 


be the estimation error covariance matrix. Then, given a prior estimate 
of the system state X,,,-1, the filtering problem is to find an updated 
estimated X,,n, based on the measurement yn. 

The unbiased estimate is given by the linear recursive form 


Xs = p eee + Knl¥n — AX 5) ; (9) 


where K,, is a time-varying weighting matrix known as the Kalman 
filter gain matrix. The optimal* K,, is given by 


Kn = PrHi(HnPnAn + Rn)”. (10) 
The error covariance matrices are found to be: 
S, = (I — KiHn)Pr, (11) 
where I is the s X s identity matrix, and 
Pri = dnSnbn + Qn. (12) 


* The criterion of optimality is to minimize the mean square estimation error. When 
w, and v, are normally distributed, then the same result is obtained by a Bayesian 
method or the method of maximum likelihood. 
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The estimates of the future state vectors are obtained by extrapo- 
lation using eq. (5) 


p nea = re Cay eS + Un-+e 
k~1 " k-1 k-1 
= (iI ent Xn + PY Uns (1 bem] = | Qe . (13) 
a zz] msl 


It should be mentioned that if w, and », are Gaussian, the Kalman 
filter estimate is at least as good as any other estimate (either linear or 
nonlinear). If the noise terms cannot be assumed normal, then the 
Kalman filter yields the optimal linear unbiased minimum variance 
estimate, but there may be a nonlinear estimate that is superior in 
mean square error.*” 

To implement the above described algorithm, we note that: 

(tz) An initial state estimate and error covariance are necessary to 
start the recursion. This problem will be considered in Section 3.2.1. 

(ti) Since the estimation error covariance matrix S,, and prediction 
error covariance matrix P, do not rely on observed data, for given 
sequences {Q,}, {Rn}, and initial P,,,,* the gain sequence {K,} can be 
precalculated. Specification of Q, and R,, will be discussed in Section 
3.2.2. The choice of the gain sequence will be examined in Section 
3.2.3. 

(iii) It is not necessary to store the history {yi, --- , Yn} since all 
relevant information concerning the series is included in the state 
vector estimate Xin. 

(iv) The algorithm assumes the knowledge of the future determin- 
istic events {U,,}. If estimates of the impact of these events are not 
available (user input) or are in error, the system needs a recovery 
procedure. However, when a significant change in the demand is 
observed, the algorithm has to differentiate between outliers (because 
of measurement errors, or demand volatility) and deterministic events. 
The problems of outlier detection and response to special events are 
considered in Sections 3.2.4 and 3.2.5. 


3.2 Implementation considerations and parameter selection 


In most applications, the exact statistical structure of the individual 
time series is unknown. Consequently, implementation of the Kalman 
filter model requires selection of estimated values for the algorithm 
parameters, usually through experimentation. Three methods to ob- 
tain initial estimates for X,,»,-1 and P,, will be analyzed next. Then, 
the specification of R and @ and the choice of gains and outlier 
thresholds will be considered. 


* Time no is the assumed “present time” for filter initialization, given the available 
data history {y1, +++ Yn) +** Yn}. 
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3.2.1 Filter initialization 


Although the special-services segment of the total network is rapidly 
growing (at a rate of more than 9 percent per year) there are very few 
circuit groups or point-to-point disaggregated groups constantly grow- 
ing. Most of them vary around a constant value, many of the groups 
have all their circuits disconnected (about 30 percent of the groups 
“die” over a 5-year period), and more new groups appear. 

This frequent in-and-out activity rules out delaying the forecast 
until sufficient data is available to make accurate estimates of X,,,n,—1 
and P,,,. It is important to have initial state-vector estimates as soon 
as observations are available. 

We considered three filter initialization methods for implementation 
in ssFs. Subsequent testing on actual data files (described in Section 
3.3) was used to decide on the most appropriate one. Each method 
assumed that the length of every circuit group history is somewhere 
between 2 and 71 months.* As mentioned in Section 2.2, we look at 
quarterly average values of the demand for special services. The three 
methods are the following: 3 

(t) Estimate >, eae and P,,, by unweighted least squares. Assume 
a linear first-order model of the form y = Bp + Biz + €, where z is the 
time variable (in our case, it is just the index of the observations, since 
the seasonal analysis can be assumed equally spaced in time), and e, 
an error variable with mean zero and unknown variance o”. Given the 
observations y = (yo, yi, -**, Yn-1) taken at times z = (0, 1, ---, 
n — 1), yn is estimated (by least squares) by J, = Bo + Piz, and Jo = 
Bo. Therefore, 


In = Bo + Bizn = Bo + Bin = Jn + Bi, 
ee | = ny Rind = BR, pi = var Vn , 
p2=var fi, and pi =p? = cov(jn, fi) 
= %(var 9, + var B, — var Yn+i). 


The estimates Bo, B,, ¥n, and 6G” are obtained with the usual regression 
formulas (as in Ref. 6). 

(iz) Use the present ssFs model prediction for » en and estimate 
P,,, from method (i). 

(tii) Use the first quarter of data (2 or 3 months) to obtain a crude 
estimate X1,(£11 = quarterly average, £2; = the slope of a line best 
fitting the data). Then use the Kalman filter algorithm sequentially on 
each quarterly average demand up to the current date. Estimate, as in 


* When only one month is available, an arbitrary growth factor has to be used, and 
71 months is the maximum history length stored by the ssFs history files. 
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(z), the initial error covariance matrix as the average of the errors in 
extrapolating for the 3 months in the second quarter. | 

Intuitively, this last method was expected to perform best, since, in 
most of the cases, enough history was available for the filter to achieve 
steady-state performance. Experimental testing of these initialization 
procedures indicated that indeed method (viz), the use of the filter 
algorithms as early in the history as possible, resulted in the best 
initial state-vector estimates > oe and P,,,. 

Weighted least squares, using minimum variance initialization esti- 
mates,’ was not considered because special-services demand data may 
have many deterministic jumps. A distinction between these jumps 
and possible outliers could not be made since there were no records to 
indicate when such significant events occurred. This lack of informa- 
tion is equivalent to changing the characteristics of the vector U,, in 
eq. (5) into a random variable with unknown distribution. 


3.2.2 Specification of R and Q 


Various procedures exist for the estimation of the {R,} and {Q,} 
parameters. The methods vary in their relative complexity and the 
number of assumptions needed for the underlying statistical properties 
of the system. In most applications with relatively short time series, 
little improvement in performance is expected from a highly sophisti- 
cated specification procedure. A simpler method is used: instead of 
trying to identify {R,} and {Q,} for each individual time series, a 
scalar R and a matrix Q are determined that approximate the general 
nature of all series considered. Consequently, only one gain sequence 
{K,} and initialization matrix P,,, has to be precomputed and applied 
to all circuit histories. 

In our case, estimation of R and Q is obscured by the occurrence of 
deterministic events. For example, to estimate R, the series first has to 
be cleansed of special events, but any special events recognition is 
based on a measure of R. Nevertheless, upper bounds for the measure- 
ment error variance can be estimated assuming no deterministic 
events. The estimate measurement error, R, was found to be approxi- 
mately 5 for Company B and 19 for Company C. 

It can be shown** that the calculation of the gain sequence depends 
only on the relative magnitude of Q compared to R. Hence, if R is 
normalized to unity, only @ needs to be specified. We discuss the 
influence of R and Q on the gain sequence, filter responsiveness, and 
the selection of specific values for Q, based on experimental testing, in 
the next section. 


3.2.3 The gain sequence 


Accurate specification of the elements of Q is important, especially 
as they affect the gain sequence {K,,} values for large n. Fig. 4 shows 
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Fig. 4—Kalman filter gain sequences. (a) K;, sequence; (b) K;, sequence. 


how the gain sequence is influenced by different assumptions made on 
the elements of Q. For Q = 0 the gain sequence converges to zero, 
since @ = 0 is equivalent to a process {X,,} evolving deterministically 
relative to the initial set of parameters (see Fig. 4, curve A). A nonzero 
Q will force the sequence {K,} to give enough weight to new obser- 
vations y, so that a nonstationary process is correctly tracked by the 
filter (Fig. 4, curve B). 

The choice of {K,,} is based on the desire to be responsive to changes 
in demand, while maintaining relatively stable forecasts. To obtain 
this result even when the true statistical nature is not known and Q 
estimates are difficult to obtain, a truncated gain sequence can be 
used.” A truncated gain sequence Kj, (Fig. 4, curve C) is defined as 


K, wWwnsn* 
Kk,= and n*=1, 
Kn« if n>n* 


where K,, is calculated under the false assumption that Q = 0, and n* 
is empirically determined to ensure sufficient responsiveness and near- 
optimal filter performance. Another advantage in using the K;,, se- 
quence over the optimal sequence is that a finite vector {Kj, --- , Kn} 
can be computed and stored. [Fig. 4 is derived from eqs. (10), (11), and 
(12), and estimates of R, Q, and P,,.] 

Given the demand data characteristics in our case, the matrix Q had 
to be nonzero to ensure filter responsiveness to random variation in 
the model parameters. For the normalized Ro value of 1, different 
matrices @ were tested and corresponding steady-state gains calcu- . 
lated. : 

_ Figure 5 shows the theoretical performance of different gain se- 
quences for the ssp-spA algorithm when the true modeling error 
covariance matrix is constant and not zero (Q # 0): Curve A is the 
theoretical performance when the gain sequence is calculated under 
the false assumption that Q, = 0, curve B is obtained when the true Q 
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Fig. 5—Theoretical algorithm performance. 


is used in obtaining the gain sequence, while curve C shows the 
theoretical performance of the truncated gain sequence (n* = 10), also 
computed under the assumption that Q, = 0. This figure is derived 
from the generalized formula for S,, 


Sn = (I — KnHn)Pn(I — KnHn)" + KnRnKn, 


which is true for an arbitrary gain matrix K,.*’ 

The initial values of the gain sequence depend on the P,, values. 
The absolute values of the P,,, were varied to obtain the transient gains 
(Kn, n = 14) that gave the best algorithm performance. This P,,, and @ 
produced a near constant (over time) gain-vector sequence. Further 
testing on many time series indicated that a single algorithm using 
constant gains performs as well as any other. This empirical result is 
substantiated by theoretical near-optimal performance of a constant 
gain sequence, shown in Fig. 5, curve D. The constant gain was selected 
to be the best approximation obtained to the optimal gain. Conse- 
quently, constant gains were selected for the SSD-SPA implementation. 


3.2.4 Outlier detection and data validation 


An important characteristic of the special-service demand time 
series is the high volatility. This volatility affects the expected quality 
of the forecasting procedures and the determination of outlier detection 
screens and responses. Usually, outliers are defined as those measure- 
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ments that are significantly different from the trend because of data 
volatility, measurement errors, or deterministic changes in the level. 
Significance is determined based on the observed measurement vari- 
ance about the assumed trend line (usually a band of 2 to 3 standard 
deviations around the expected line). Once a measurement falls beyond 
these boundaries, the outlier detection routine determines whether the 
measurement indicates a change in the level of the trend or whether 
it is an outlier (data volatility or measurement error). The former case 
is decided based on subsequent measurements, i.e., if the following 
data conform to the change. In the latter case, the measurement has 
to be partially or complete ignored, based on the probability of being 
a true measurement of volatile demand or an erroneous one. 

Previous studies indicated that the majority of the grouped demand 
time series is truly of a highly volatile nature with sudden changes; 
large and frequent changes (up to 1000 circuits added or subtracted in 
a single month), many times in opposite directions (big rise in level 
followed shortly by a big drop), were evident. 

Consequently, it is impossible in the case of the special-services 
demand data to decide if an outlier was produced by a data-base error; 
therefore, no data validation decisions are recommended for the outlier 
detection routines. 

Instead, such errors if present will be handled by the deterministic 
event detection and response procedure described in the next section. 


3.2.5 Deterministic events detection and response 


As mentioned in Section 2.1, two other important demand patterns 
must be considered in designing a projection algorithm: 

(t) Significant changes in the demand level when subsequent ob- 
servations confirm the supposition that a special event had taken 
place. 

(it) Zero growth when the time series remain for long periods of 
time at constant levels. 

3.2.5.1 Step changes in demand levels. Since data is available 
monthly, detection of significant level changes should be made 
monthly even though the forecast is made quarterly. In this way, a 
quarterly response will be the result of at least three, and up to a 
maximum of six monthly movements. Let 


di = Ymonth; — Ymonth,_p 

1; = (Ymonth, + Ymonth,,)/2, and 

d?¥ = maximum value |d;| can have before it is considered significant. 
Two functional forms are usually used to calculate d?: the linear 
function d* = a + bd; or the mixed linear-exponential function d7 = 
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d(a + be), In general, the latter form has the advantage that it 
increases the boundaries, percentagewise for small values of d;. For 
the special-services demand time series, the integral nature of the d;’s 
made this advantage insignificant. Optimum values for a, 6, and c 
parameters were experimentally tested for both functional forms, but 
no improvement was found in the algorithm performance when the 
mixed linear-exponential boundaries were used. Since the simple linear 
form reduced the total computational time, we recommended the 
following linear deterministic event boundaries for SSD-SPA: 


d¥ = 0.7 +0.11 (month, + Ymonth,,) forall i= 2. 


We present next a brief description of the detection and response to 

past deterministic changes in the demand. 
(t) Detection step 

This procedure first determines if the given d;j-1 1s significant, and 
second if the subsequent d; confirms this event. This confirmation 
means that either d; has the same sign as d;-;, or the net difference 
between |d;| and | d;-1| is large enough to be a deterministic event by 
itself. There are four possible cases: d;-; > 0 and d; > 0; di-1 < 0 and 
d; < 0; dj-: > 0 and d; < 0; and d;-; < 0 and d; > 0. In the first two 
cases, dj-) is confirmed, since the next movement has the same sign. In 
the last two cases, the movements have opposite signs. To distinguish 
between volatility and special events, we subtract from both what can 
be attributed to volatility, ie. minimum {|d;|, |d;-1|}. There are, then, 
two cases: |d;-,| > |d,|, and \di-a| < |d;|. 

Case 1: |dj-:| >|d;|. Then new dj-1 = d;-1 + d; and new d; = 0. If 
_d‘-1 compared to d?, is significant, then di-; is a special event and 
d! = 0. If not, both dj and dj_; are zero. 

Case 2: jdj-:| < Idi. Then new d/ = d; + d;-1 and new dj-; = 0. If 
|d/| = d7, then dj is a special event and dj_, = 0. Otherwise both are 
zero. 

(uz) Response step 

From these monthly detected level changes, the quarterly events 
have to be calculated. A quarterly value Y; is an average of three 
months: yj, yi+1, and ¥;+2. (The Kalman filter model uses this Y; as data, 
as described in paragraphs 3.2.1 and 3.2.2.) The effect of a monthly 
change on the quarterly average depends on the position of the month 
in that quarter. If the change happens in the first month of the quarter 
(d;), then all y’s in Y; are moved to this second level, and the change 
in quarterly averages is D; = Y; — Y;-1 = dj. If the change happens in 
the second month (d;+1), then only y;+1 and yi+2 reflect the change, and 
Y; — Yj-1 = *%di+1. The remaining 14d;,1 will appear as a difference 
between Y; and Yj+. Finally, if the change appears at +i+2,(di+2), then 
Y; — Yj-1 = ‘“di+2 and Yj+1 — Yj = *hdi+e . 
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Consequently, the changes observed on quarterly values are deter- 
mined by five possible monthly events: 


D; = Yadi-2 + *di-1 + di + dis: + “Adi+e. 


Since we do not know how much of any D; is actually normal growth, 
we recommend that when D; fully explains the difference between Y; 
and Y;-1, to consider that it already included the growth. Certainly D; 
is not available before Y;, and therefore, the state estimate £};-1 is 
calculated first using the estimates i} ;-1 of future deterministic events 
which can be input to SSD-SPA, or from eq. 13: 


A A A, 
Xj, j-1 = OXj-1,j;-1 + U;, j-1 


After D; can be calculated (D; = i} ; = estimate of u; after Y; has been 
observed), then the state estimate £}, ;-1 can be updated by: 


Kh j-1 <— X} 5-1 _ tj, 5-1 + D; if D; xf Y; _ Yi or D; = 0 
Al Al nm é‘ : 
<—X,j-1 7 Uj j-1 F D; _— Rag if D; = Y; _ Yi and D|# 0. 


Then, eq. 9 follows to calculate x; ;. 

Since events often do not occur as planned, this procedure also 
ensures algorithm recovery when erroneous estimates of future deter- 
ministic events are input to SSD-SPA. 


3.2.5.2 Zero growth. Two quarters (or 6 months) with constant level 
of demand are regarded as sufficient evidence that the main tendency 
of that particular circuit group is to stay at that level for a longer 
period of time. However, if the filter estimate of the growth is not very 
close to zero, it may take many quarters to finally converge to zero 
since the filter has to be robust enough to perform on other higher 
volatile series. An appropriate procedure to force the growth estimate 
to converge to zero faster is to reduce the growth estimate (£2) by a 
factor y (i.e., x7,n becomes x7,,/y) whenever zero quarterly growth is 
observed. Subsequent testing found y = 2 to be a good value and 
concluded that this test is very robust for small variations of the y 
parameter. 


3.3 Performance analysis 


Three objectives were identified for the SSD-SPA performance anal- 
ysis: First, to determine and quantify the improvement in forecast 
accuracy, rms error, stability, and misplacements relative to the exist- 
ing forecasting algorithm (described in Section 2.3). Second, to deter- 
mine if the proposed algorithm has the desired properties (listed in 
Section 2.3.3) derived from the special-service demand data character- 
istics. Third, to assess the potential economic benefits resulting from 
incorporating SSD-SPA into SSFSs versus its implementation costs. 
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The SSD-SPA was evaluated quantitatively using the accuracy, rms 
error, stability, and misplacement relative error statistics. To ensure 
relative algorithm performance analysis consistency, the same data 
was used as in the ssFs study (Sections 2.1 and 2.3.2). For these time 
series, equivalent consecutive forecasts were produced using the new 
sequential projection algorithm, and forecast performance measures 
were calculated. 

Network aggregated error statistics were used in the selection of 
algorithm parameters, as well as in comparing the new SSD-SPA per- 
formance to the present algorithm. 

The resulting forecasting algorithm was found to be robust over 
small variations of all parameters around their optimum values. 


3.3.1 Results: Accuracy, rms error, stability, misplacements 


Figure 6 displays graphically the performance of the SSD-SPA using 
both companies’ history data. Special-services demand sequential pro- 
jection algorithm generates forecasts that are significantly more ac- 
curate and stable. Tables I and II give the SSD-SPA versus present 
algorithm relative improvements in forecast accuracy, rms error, sta- 
bility, and total misplacement. 

Figures 7a and 7b present two examples of the SSD-SPA versus 
present algorithm total error (TE) and misplacement (M) relative 
improvements for the forecasts generated in 1974. 

Figure 8 gives histograms of relative improvement for the 1-year- 
ahead forecast accuracy, rms error, total misplacements, and stability 


75 150 


50 100 


25 50 


PERCENT ACCURACY 
PERCENT rms ERROR 





FORECAST SPAN (YEARS) FORECAST SPAN (YEARS) 


——=- COMPANY B 
COMPANY C 





PERCENT STABILITY 





lvs.2 2vs.3 3vs.4 
FORECAST SPAN (YEARS) 


Fig. 6—SSD-SPA network average forecasting performance. 
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Table I—SSD-SPA Percent relative forecast improvement over 
present algorithm 


Com- 1-Year 2-Year 3-Year 4-Year 
Forecast Attribute pany Span Span Span Span 

Accuracy C 30.0 29.6 29.6 31.9 

B 25.0 24.0 25.0 29.0 
Root-mean-square C 17.8 18.3 20.7 23.0 

B 19.7 19.1 22.2 24.0 
Stability C 15.0 27.0 38.8 

B 20.0 33.0 51.0 


Table II—SSD-SPA Percent relative improvement in total 
misplacements over present algorithm 


Forecasted Year 
Base Year 


Company Forecast 1975 1976 1977 1978 
C 1974 17.4 19.5 20.2 21.6 
1975 18.3 15.7 13.7 

1976 18.3 13.1 

B 1974 21.9 19.2 19.0 23.3 
1975 17.8 19.0 22.4 

1976 22.9 22.3 


(1 versus 2-years-ahead for stability). Much of the observed improved 
performance is because the new algorithm can detect and properly 
respond to step changes in the demand level. Figs. 9a and 9b show 
how SSD-SPA processes the data shown previously in Figs. 3a and 3b as 
examples of SsFs poor performance. 

It should be noted that both examples only show how past deter- 
ministic events (before the start of the forecasting period, i.e., before 
July, 1974, in Fig. 9a, and July, 1975, in Fig. 9b) are treated. No 
knowledge was assumed about future special events, such as the one 
on October, 1976 (Fig. 9b). Once the data up to these events are 
available, even if no, incomplete, or wrong information would be input 
into SsD-SPA, the algorithm could recognize them and properly adjust 
the forecast, as was shown for the events on May, 1974 (Fig. 9a) and 
February, 1974 (Fig. 9b). The present ssFs algorithms treated these 
events as part of the normal growth, as shown in Figs. 3a and 3b. 


3.3.2 Small integer forecast 


In Section 2.3.3, we stated six desirable properties for the new 
forecasting algorithm based on the demand time series characteristics. 

The unequal weighting of data, acceptance of exogenous informa- 
tion, and a short initialization period are shown to be part of the 
proposed mathematical model itself (Section 3.1). The recursive filter 
model adds computational efficiency since it does not explicitly use 
past data. Recognition of the past deterministic events and algorithm 
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Fig. 7—SSD-SPA versus SSFS: total error and misplacement (forecasts generated in 
1974). (a) Company C; (b) Company B. 


recovery are ensured by the two procedures described in Section 3.2.5. 
It only remains to see if SSD-SPA performs adequately when small 
integers are to be forecasted. 

To quantify this, the tests were repeated using only those point-to- 
point time series consisting of integers less than 10 (approximately 80 
percent of all point-to-point demand time series). 

Results of these tests on both companies’ data bases showed that 
for small integers the relative forecast improvement of SSD-SPA is 
about 50 percent in accuracy, 30 percent in rms error, 30 to 66 percent 
in stability, and 50 percent in total misplacement. Moreover, total 
forecast error was found to range between 1 to 3 percent for SSD-SPA 
versus 3 to 28 percent for the present method. These last results 
excluded the “vanishing” time series in order to obtain unbiased 
attribute estimates. 
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Fig. 8—-SSD-SPA versus SSFS: percent relative improvements in 1-year forecast attri- 
butes for Company C. 


3.3.3 Economic benefits and implementation costs 


The comparative study of the present ssrs forecasting algorithm 
and the ssp-spA showed that the new algorithm generates forecasts 
that are significantly more accurate and stable. Implementation of 
SSD-SPA in SSFS would, therefore, translate into important economic 
benefits in three areas: capital expenditures, forecasters’ time, and 
electronic data processing costs. 

(x) The major impact is expected to be on capital savings. The 
following analysis is based on the SsFs preliminary forecast before any 
manual adjustments are made. (There are no records available with 
the final adjusted forecasts made at different times in the past, nor 
with the exogenous information available to the forecaster.) The 
results showed the l-year ssFs forecast positive misplacement of 
circuits to be 12 percent, on the average. That is, 12 percent of the 
total special-services circuits in the 1-year forecast could be in the 
wrong groups resulting in an overprovisioning. One-year results are 
used to be conservative; additionally, for a 1-year error there is less 
chance to reuse misplaced facilities. 
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Fig. 9—Circuit groups with deterministic jumps. SSD-SPA treatment. (a) Example 1; 
(b) Example 2. 


The SSD-SPA reduces the misplacement to 7 percent; a reduction of 
5 percent. Theoretically, 5 percent of the total special-services circuit 
network could be removed without a change in service. Underprovi- 
sioning is approximately the same for both algorithms.* 

(ti) The improved forecast accuracy, the recognition of past deter- 
ministic events, and the shorter forecast initialization requirements 
are SSD-SPA features that translate into fewer manual forecast adjust- 
ments. Fewer adjustments would permit the forecasters to concentrate 
more of their efforts to follow the economic conditions and estimate 
their impact on the future demand for special services. 

(tii) The SSD-SPA is based on one forecasting model only and makes 
no explicit use of all the data history. Consequently, run times and 
core usage would be reduced. Although the absolute savings are not 
large, they would make SSsFs very suitable for an on-line use. 


IV. CONCLUSIONS 


The goal of our work was to design an algorithm able to forecast 
future demands for special services: highly volatile time series mainly 
consisting of small integers, and with numerous deterministic jumps. 
We have shown that a linear, dynamic time-series model with linear 
growth and deterministic input, together with the Kalman filtering 
technique for state vector estimation and prediction, can produce 
demand forecasts which are significantly more accurate and stable 


* This apparent positive bias is due to two types of groups. The first is those groups 
which “vanish” during the period. The second is those in which large deterministic 
events occurred. In the new algorithm, these events can be handled by input of marketing 
information. 
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than the forecasts produced by the best (highest R) choice of four 
unweighted regression models: the linear, exponential, and first- and 
second-order autoregressive. The new model, its attributes, and spe- 
cific parameters were selected based on the characteristics of actual 
special-service demand history from three Bocs. 

The improvement in accuracy is due to the capability of the system 
to track nonstationary processes, and also to recognize and react 
properly to deterministic changes in the demand, even when no, or 
wrong exogenous input was available. The use of a single model is 
responsible for much of the stability improvement. Additionally, ssp- 
SPA can produce many views of future demand using different assump- 
tions on future events, it requires a short initialization period, and it 
results in the need for fewer manual adjustments. Therefore, we 
propose to replace the existing algorithm in the ssFs by this simple 
and more efficient algorithm. 
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Demand Servicing 
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Trunk servicing ts the continual process of collecting trunk group 
traffic measurements, monitoring network service, and augmenting 
the network when necessary. This study addresses the possibility of 
using a short-term forecast to determine the adequacy of trunk 
quantities planned for the imminent busy season. When seasonal 
patterns of demand exist, it may be possible to use observed, pre-busy- 
season traffic levels to predict accurately that busy-season demand 
will exceed the planned trunk group capacity and to determine 
appropriate corrective action. Toward this end, we develop a seasonal 
load forecasting algorithm based on Kalman filter estimation tech- 
niques and analyze the effectiveness of this approach using Bell 
operating company data. For trunk groups exhibiting seasonal de- 
mand, the short-term (1% months ahead), seasonal forecast error is 
50 percent less than that of the sequential projection algorithm (SPA), 
which linearly trends the yearly busy-season loads. Much of this 
improvement ts attributed to the ability of the seasonal algorithm to 
utilize recent observations; the one-year ahead seasonal forecast 
error is only 20 percent less than that of sPpA. We conclude that the 
greater generality and simplicity of spA makes that algorithm the 
appropriate choice for the annual busy-season trunk forecast used in 
medium-range network planning. However, the seasonal algorithm 
demonstrated the ability to use recent data to respond quickly and 
accurately to various situations that result in inaccurate SPA fore- 
casts. For this reason, the short-term forecasting algorithm developed 
herein is a potentially valuable tool for network administration. 


l. INTRODUCTION 
1.1 Motivation 


Demand servicing is the process responsible for detecting and cor- 
recting overload conditions in the trunk network. Such conditions 
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inevitably occur when unanticipated traffic levels exceed the planned 
capacity, which must be maintained at a reasonably low level to 
provide good service at low cost. 

The planned capacity for each trunk group is determined primarily 
by the annual forecast of busy-season trunk requirements. Trunk 
servicing, which includes demand servicing, is the continual process of 
collecting trunk group traffic data, monitoring network service, and 
augmenting the network when needed. This paper addresses the 
possibility of supplementing the annual forecast and weekly monitoring 
process with a short-term forecast of imminent busy-season require- 
ments. Specifically, when seasonal patterns of demand exist, it may be 
possible to use observed, pre-busy-season traffic levels to predict 
accurately that busy-season demand will exceed the planned trunk 
group capacity. Thus, service problems may be predicted and possibly 
avoided by “anticipative” demand servicing action. 

When the need for demand servicing arises, the trunk servicer must 
decide on the locations and magnitudes of trunk group augments 
required to restore service. If current traffic levels already exceed those 
forecast, the servicer would like to know whether the peak load level 
has already been reached, or whether even higher levels are imminent. 
For each trunk group that must be augmented, the servicer should 
know the minimum amount of additional capacity required to both 
relieve the existing problem and to provide adequate service through 
the remainder of the busy season. 


1.2 Short-term trunk forecast 


Both of the trunk servicing functions described above, namely, the 
anticipation of imminent service problems and the determination of 
appropriate demand servicing augments, could be performed with the 
use of a short-term trunk forecasting system. Such a system would 
recognize within-the-year demand patterns and use this information 
to make accurate, short-term predictions of busy-season load. As such, 
the short-term forecast would serve as a “back-up” to the yearly busy- 
season forecast, recommending remedial action in those cases where 
the latter is significantly in error. 

The purpose of this study is to investigate a short-term forecast to 
provide the trunk servicer with accurate, useful information concerning 
near-term traffic levels. With such a tool, service problems could be 
avoided by anticipative demand servicing and useful reserve capacity 
could be identified. This would allow the servicer to make more 
efficient use of existing facilities and equipment, thus, reducing the 
amount of reserve capacity required to maintain good service. 


1.3 Overview 


Section I is a discussion of the general requirements of a short-term 
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trunk forecasting system. Motivated by these requirements, we con- 
sider in Section III the class of linear dynamic models and show how 
these models can be used to represent trunk group load histories 
exhibiting seasonal variation. In Section IV, we discuss a recursive 
estimation procedure, known as the Kalman filter, that is appropriate 
for this class of models. Using trunk group data obtained from a Bell 
operating company (BOC), we test the performance of a specific fore- 
casting algorithm in Section V, and compare its performance with the 
year-to-year forecast produced by the recently developed sequential 
projection algorithm (sPA).’* Section VI summarizes our findings. 


H. SYSTEM REQUIREMENTS 


The selection of an appropriate class of time-series models and 
forecasting procedures for consideration in this study depends heavily 
on the intended mode of operation and operating environment. The 
general requirements of a short-term trunk forecasting system are 
discussed in this section. These requirements will motivate the class of 
time-series forecasting algorithms considered in Section III. 


2.1 Time series model 


Underlying any time-series forecasting procedure is a mathematical 
model describing the structure of the series being forecast. For short- 
term trunk forecasting, the time-series model used must be sufficiently 
flexible to model a wide range of trunk group growth patterns accu- 
rately and to track changes in these growth patterns closely. This 
requirement is shown in Fig. 1. Figure la shows the load history of a 
Boc only-route group exhibiting a highly regular seasonal pattern 
modulating a nearly constant, linear trend. The behavior of such — 
demand can be accurately predicted by an reasonably appropriate 
procedure. In particular, the annual 1-year SPA forecasts of busy-season 
demand are quite accurate. 

In contrast, the trunk group load history shown in Fig. 1b requires 
a more sophisticated treatment. The distinctly nonconstant trend leads 
to gross errors in the spa forecasts of busy-season loads, since the SPA 
algorithm design assumes approximately linear growth or decline. 

Although the growth pattern shown in this figure is clearly noncon- 
stant, it is not unpredictable. The short-term trunk forecasting algo- 
rithm should have sufficient built-in flexibility to accommodate such 
behavior. 


2.2 Data requirements 


To provide good service at low cost, the Bell System network is 
constantly evolving. This evolution accounts for the introduction of 
new and improved technology with the accompanying changes in 
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Fig. l1—(a) Trunk group AA001444—annual busy-season forecast. (b) Trunk group 
AA024225—annual busy-season forecast. 


network configuration, and for change and growth in customer demand 
patterns. As the network evolves, new trunk groups, connecting new 
switching systems in a more economical network configuration, come 
into service and serve traffic previously carried by existing groups, 
which may be phased out of service. Also, in the past, trunk group 
histories were not maintained and data collection schedules were less 
comprehensive. For these reasons, long, uninterrupted trunk group 
load histories are, and will continue to be, atypical. 

If it is to be useful, the short-term trunk forecasting system must be 
capable of operating in this type of environment. It should be capable 
of producing accurate forecasts based on relatively small amounts of 
historical data (e.g., 2 years). Also, it should be able to process and 
respond to information concerning semideterministic events (e.g., main 
station transfers and traffic reroutes) that affect the trunk group load 
pattern. 


70 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1982 


2.3 Computational requirements 


In most Bocs, the trunk servicer relies on the Trunk Servicing 
System (Tss) to process recent trunk group traffic measurements and 
to obtain estimates of current load levels, traffic characteristics, and 
trunk requirements.’ On a weekly basis, the system must process 
measurements and produce reports on several thousand trunk groups. 
Therefore, when considering candidate short-term forecasting algo- 
rithms for implementation in such a system, computational efficiency 
is of great importance. Equally important is the need for mechanized 
procedures requiring minimal intervention by the servicer, who typi- 
cally must administer several hundred, and possibly thousands, of 
groups. Thus, computational efficiency and automation requirements 
rule out those procedures commonly used to perform detailed statis- 
tical analyses of individual time series. (See for example, the methods 
described in Ref. 4.) 


2.4 Summary 


In summary, the short-term trunk forecasting algorithms considered 
in this study should be able to track the trunk group load series 
accurately and adapt to dynamic changes in the demand pattern. In 
addition, they should perform adequately after a minimal initialization 
period and be computationally efficient. 

In the next section, we describe a class of algorithms satisfying these 
requirements. 


lll. LINEAR DYNAMIC MODELS OF SEASONAL TIME SERIES 


In this section, we consider the representation of trunk group load 
histories exhibiting seasonal variation by a linear, dynamic time-series 
model. The motivation for considering such a model for the application 
considered in this paper is discussed in Ref. 5. T’o summarize, this 
formulation is sufficiently general to describe a large number of time 
series of practical interest and is compatible with certain computation- 
ally efficient estimation techniques that make optimal use of limited 
amounts of data. The estimation and forecasting of these models will 
be considered in Section IV. 


3.1 Mathematical model 


A discrete time series is a sequence of observations of some quantity 
of interest. We think of such a series as being a realization of some 
stochastic process {y;}, which serves as a mathematical model explain- 
ing the observations, and which allows us to make inferences concern- 
ing future values of the series. 

In the linear dynamic model, the behavior of the series is determined 
by an n-dimensional state-vector process {X;} and the following two 
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equations that describe the time evolution of {X;} and the relation of 
X, to the observation yz. | 
The first equation, called the system equation, is 


Xi41 = GX: + Wi, (1) 


where @; if ann X n “transition” matrix and {W;} is an n-dimensional, 
zero-mean white noise process with 


JO iff st 
E(W.W;) = OP eee (2) 
More generally, we can consider the system equation 
Xeaa = GX: + Ur + Wi, (3) 


where U; is an n-dimensional deterministic or stochastic input to the 
system at time @. 

The matrix ¢; in eqs. (1) and (3) describes the deterministic move- 
ment of the state variables comprising X;. The white noise process 
{W,} explicitly allows for random variations in these state variables 
and, therefore, significantly enhances the flexibility of this formulation. 

The second equation, called the observation equation, is 


Si= H,X, + Et, (4) 


where H, is a k X n “observation” matrix and €; is a k-dimensional, 
zero-mean, white-noise vector sequence, independent of {X,}, with 


[0 if sxt 
: Btee) = |, ips! ce 


Thus, the observation y; is a linear function of the state variables 
corrupted by an additive disturbance e,. 

Since we will only consider univariate time series { y;} in this paper, 
we will assume henceforth that k = 1. 


3.2 Examples 


Next, we present a few simple, but relevant, examples to demonstrate 
that this formulation describes many kinds of time series. Additional 
examples can be found in Ref. 6. 


3.2.1 Linear growth model 


In many applications in which a quantity is to be forecast over a 
relatively short time span, it is reasonable to assume that the quantity 
is growing approximately linearly with time. In particular, the spa’ 
design assumes that busy-season trunk group load varies from year to 
year along a nearly linear trend. 

Linear growth can be described by the two-dimensional linear model 
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with the following parameters: 


Boao anh, == 
m= 6=| 5 . H, = H = [1, 0]; 


(1) 
x 
X; = | (6) 
In this model, x!” represents the level of the series at time ¢, and 
x” the instantaneous rate of growth. The inclusion of the white noise 


process described by {Q;} allows the trajectory to deviate randomly 
from a straight line. 


3.2.2 Seasonal models 


A time series {y;} is said to exhibit seasonality if observations 
separated in time by some fixed interval (usually one year) exhibit 
similar behavior. The simplest kind of seasonality is periodicity; that 
IS ¥: = Yrane for some positive integer L and every integer n. Two linear 
models of periodic behavior are described below. 

A useful model of seasonal variation is available via the trigono- 
metric representation for periodic sequences.’ Let { y:} be periodic, 
with period L. Then { y;} can be represented as follows: 


L~1 


2 
Me = a1 + 2 [a2;cos( jwt) + aj+isin(Jwt))] + ar(—1)', (7) 
j= 


where w = 277/L and the last term is omitted if L is odd. Hereafter, we 
will assume that L is even. The coefficients {a} in the trigonometric 
expansion are determined by 


1 L 
n= L >> Yi 
1=1 
9 L 
012) = 2 yiCOS(tJw) 
9 L 
Agj+1 = L 2 yisin(tJw) 
= yn (8) 
are L i= a 


Note that the first coefficient a; represents the average level of the 
series {y:}. It can be omitted from the expansion (7) if we are 
representing a series having zero mean. 

The existence of the representation (7) follows from the fact that 
the set of time functions 


F = [1, cos wt, sin wt, ---, (—1)'] 
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forms an orthogonal basis for the linear space of all real, periodic 
sequences with period L. Also note that for every integer 7, the set of 
7-translates of Fo, defined by 


F., = [1, cos w(t — 7), sin w(t — 7), ---, (—1)"7] 


also has this property. Therefore, for each r there exists a representa- 
tion of { y,} as a linear combination of the elements of F,; 


y= al + > {aS? cos[ jw(t — 7)] + af}ei sin [jw(t — 7)]} 


+ en (9) 


with af” = a, as in eq. (7). 

The expansion of the series { y,} relative to the set of functions F, 
can be viewed as a representation in which ¢t = 7 serves as the new 
time origin. In the linear dynamic model described below, at each new 
time epoch ¢ = + + 1 we will “rearrange” the representation (9) in 
terms of the functions F,.; via a simple, linear transformation of the 
coefficients {a”}. 


Let y= ai” + © [as cos ju(t — 7) + aff sin jw(t — 7)] 
J 


+ af (—1)*" 
af z {af*” cos jw[t — (7 + 1)] (10) 


+ asp sin jwlt — (7 + DJ} + af? (-1)¢*, 
(11) 


By expanding each term in eq. (11) and equating coefficients of similar 
terms with the coefficients in eq. (10) yields 


a” = aft) 
(r) c+ 
a 7 
Qoj+1 X92; 
ay? = (-Lal*”, (12) 
where 
-1 _ | cosjw —sin jw 
ey es jw cos a (13) 
Let X, = [al”, ---, afY. (14) 
Then 
X, = o X,+1, (15) 
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where @~’ is the block diagonal matrix defined by 


1 
oi 0 
2" 
ot = (16) 


Equivalently, 
X41 = oX,. (17) 


To summarize, the trigonometric expansion (7) of {7} can be 
represented by a linear dynamic system with state vector X, = 
fai”, ---, af] comprising the coefficients in the expansion relative 
to the current time +r. The transition matrix @, defined by eq. (16), 
provides the mechanism by which the coefficients at time 7 + 1 are 
obtained from those at time 7. Finally, at time 7, we observe 


yr = ay + p> as) + af) +, 
j 


= HX, + «, | (18) 

where the L-dimensional vector H is given by 
H = H, = [1101---01]. (19) 
By including a nonzero white-noise input in the dynamics (16), ie., 
X,+1 = OX, + W,, (20) 


we can allow for random variations in the trigonometric coefficients, 
and therefore, in the seasonal pattern. 

In addition to the trigonometric model described above, another 
commonly used seasonal representation, the seasonal index model 
(Ref. 6, page 217) was considered. However, empirical performance 
results, analogous to those to be presented in Section V, indicated that 
the trigonometric model gave somewhat better results. For the sake of 
brevity, this model will not be discussed further. 


3.3 Seasonal trunk group model 


In Section 3.2, we showed how the general linear model eqs. (1) to 
(4) can be used to represent either linear-growth or periodic time 
series. We now demonstrate that these models can be superimposed to 
form a model capable of representing a wide variety of trunk group 
load series. 

Consider again the trunk group load histories shown in Fig. 1. These 
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series may be characterized by an underlying linear trend with a 
seasonal pattern superimposed. (This observation is consistent with 
the assumptions used in the design of spa.” See Section 3.2.1.) The 
former component can be represented by the linear growth model 
described in Section 3.2.1; the latter, by the periodic model described 
in Section 3.2.2. If we assume that these two effects combine in an 
additive fashion, we can represent the observed behavior by the linear 


model with state vector 
x? 
x= | (21) 
x 


where X!” is the state vector defined for the linear growth model (6) 
and X}° is that of the periodic model (14). (In the seasonal component 
X{*, the “level” term a{” included in eq. (7) is omitted, since a similar 
term appears in the linear growth component X!”.) 

The dynamics of this system are described by the transition matrix 


(2) 
o-|% so (22) 


where }” and 6“ are the transition matrices of the linear and seasonal 
models, respectively. 
Similarly, the observation matrix, H, is given by 


H = (H:H *)], (23) 


which decomposes the observation into the sum of a trend component 
HX! and a seasonal component, H“)X{’. 

Finally, random variation in the components of X; is modeled as a 
white-noise input, described by the sequence {Q,}. In particular, if the 
disturbances to the trend and seasonal components are uncorrelated, 
then Q; may be decomposed as 


(2) 
9, = i ay 


3.4 Summary 


In this section, we discussed the class of linear dynamic time-series 
models and showed how such models can be used to represent time 
series exhibiting both linear growth and seasonal variation. In partic- 
ular, a specific model was proposed in Section 3.3 for the representation 
of trunk group load histories exhibiting seasonality. 

In the next section, we consider the problem of estimating the 
parameters of such a model from the observed time series and the use 
of these estimates in forecasting. 
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1V. PARAMETER ESTIMATION AND PREDICTION 


Complementing the class of linear dynamic time-series models is a 
sequential parameter estimation algorithm known as the Kalman 
filter.°° With this technique, we estimate the model parameters com- 
prising the state vector X, from the observations yi, ---, jy; and use 
this estimate to make inferences concerning future values of the series 
(prediction). 

The general procedures by which this is accomplished are discussed 
in this section. We begin by describing a recursive algorithm for 
sequentially updating the state vector estimate as new data becomes 
available. The recursion is initiated by a weighted least-squares esti- 
mation procedure derived in Section 4.2. The application of these 
procedures to the seasonal trunk group model derived in Section III 
yields a forecasting algorithm capable of accurately tracking and 
predicting the values of the seasonal time series considered in this 
study. 

The performance of this algorithm on actual trunk group data will 
be examined in Section V. 


4.1 The Kalman filter 


Consider the linear dynamic model described by the equations 
Xai = bX + Wz (24) 
and 
= HX; + &. (25) 
Assume that 
E(W,) = 0, E(e) = 0, 


Btwiw:]= |? if st 


O: a eee (26) 


E[Wies] = 0 
and 


0 if s#t 
lees] = i. if s=t. 


Suppose that at time /¢, prior to observing y;, we have available at 
initial estimate of the state vector X;. Calling this prior estimate Xi" 
let us also postulate that the estimation error X, = X,,-: — X; has zero 
mean and is uncorrelated with e&, the observation error in y;. Let us 
also assume that the error covariance matrix of the estimate, 


*In general, the notation x. will be used to denote an estimate of X,; based on 
information available at time s. 
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Pr-1 = E(X,X}), (27) 


is known. 

When the current observation y; becomes available, we want to 
combine the prior estimate with the new observation in an optimal 
manner. Specifically, we seek unbiased linear estimates XAacs k= 0, of 
the current and future states, which minimize the quantities trace 
(Pi+z2), Where P;+,7 1s the error covariance matrix of Xr455. Thus, we 
are seeking minimum variance, unbiased linear state-vector estimates. 

The following procedure, which yields such estimates in a recursive 
manner, is known as the Kalman filter.” 


4.1.1 Filtering 


The problem of determining the optimal estimate X,, from data 
available at time ¢ is known as the filtering problem. Its solution is 
given by 


Xie = Kea + KL yy: — HeXee-1], (28) 
where the optimal gain K,, is determined as 
K, = Py-1H [HH Pie-1Ht + Rif. (29) 
The error covariance matrix of this estimate is given by | 
P,, = (I — KH) Pi:-i(I — KeHy)’ + KR KK’, (30) 
= [1 — KH, JPi:-1 (31) 


The more complex expression (30) 1s included since this relationship 
is valid for an arbitrary gain matrix K;, whereas (31) is valid only when 
K, is the optimal gain (29). 


4.1.2 Prediction 


Having obtained the optimal linear estimate X,, of X, from the data 
available at time ¢t, we now want to predict future values of the state 
vector and the series. These are obtained by extrapolation, using the 
linear dynamics relationship (24). 

That is, the optimal linear estimates of X;4; and y+x, based on data 
available at time ¢, are given, for k > 0, by 


k 
Xie, = Tl deo Xie (32) 
i=1 | 
and 
Verkt = HiseXe+he- (33) 
In particular, the optimal 1-step predictors are 
Ki = bX, (34) 
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and 
Verie = He Xe4 1,t- 


The error covariance matrix of X41, 1s 


Prsie = O:Pirdt + Qs (35) 
the one-step forecast error variance is 
var (Ve41,0) = Ae Pri eH 141. (36) 


4.1.3 Sequential estimation 


We nowshow how the results of Sections 4.1.1 and 4.1.2 can be used 
in an efficient procedure for processing the observations { y;} to obtain 
estimates of the parameters of the underlying linear model. 

First note that the 1-step prediction X,+1, given by eq. (34) satisfies 
the requirements of the initial estimate of the state vector at time 
t + 1. Thus, we can use the filtering procedure described in Section 
4.1.1 to process the next observation, 41, when it becomes available. 

In general, starting with an initial state estimate X,,,,-1 at some time 
to, we can alternately apply the procedures described in Section 4.1.1 
(filtering) and Section 4.1.2 (prediction) to process subsequent obser- 
vations ¥;,, Y+1. °** In an efficient, recursive manner. The procedure 
is summarized below. 


Inputs: Initial estimate Ress with (known) error covariance matrix 
P,,.4.-1- Set t= to. 

1. Compute the Kalman gain matrix K,, using P;:-1 and eq. (29). 

2. Use the current observation, ¥;, and eq. (28) to compute the updated 
state-vector estimate, Kes: The error covariance matrix of this 
estimate, P;:, is given by eq. (31). 

3. Obtain X,+1, and P,+1, using eqs. 34 and 35, respectively. 

4, Sett=t+1.Gotol. 

We make the following observations regarding 1mplementation of 
the algorithm described above. 

First, at each time ¢, all relevant information concerning the series 
is embodied in the state-vector estimate X,,. It is not necessary to 
store the history { 71, --+ , ye}. 

Second, the optimal gain sequence {K,;} is determined by the second- 
order statistics {Q,} and {R,} describing the variability of the process 
and the measurements, respectively. Thus, if these quantities are 
known in advance, the gain sequence {K;} can be precomputed. 

Third, to start the recursion, an initial state estimate with known 
error covariance Is necessary. The problem of obtaining such an 
estimate is considered below. 
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4.2 Initialization 


In many Kalman filter applications, it is important to have an initial 
estimate of the model state vector as soon as the observations 
¥1, ¥2, ++ begin. Since this estimate, X10, is made prior to observing 
the series { y,}, either a judicious guess must be made or information 
from an external source must be provided. In addition, the accuracy of 
this estimate must be quantified via the covariance matrix P. 

For complex models, the specification of good initialization param- 
eters can be difficult. Also, improper specification can result in poor 
initial performance, because of improper weighting of the first obser- 
vations. Both of these problems can be avoided completely if we can 
afford to wait until sufficient observations have been made that an 
initial state-vector estimate can be based on the data alone. 

Since the short-term forecasting application considered in this paper 
is for use as a supplement to the yearly busy-season trunk forecast, it 
is neither necessary nor desirable to consider the use of the short-term 
forecast until sufficient data is available to make accurate predictions. 
For this reason, we recommend that the series { y:} be observed over 
an initialization period, say from ¢ = 1 to t = T, so that the initial state 
estimate can be based on the data alone. The method by which this is 
accomplished is described below. 


4.2.1 Linear model 


Consider the linear dynamic system described by the equations 
Xi = OX + W; (37) 
and 
= HX, + e;. (38) 


We assume that the matrices H and @ do not vary with time,* and 


that the latter is invertible. 

We will show that each observation y7_; within the initialization 
period can be expressed as a linear function of the state vector Xr at 
the end of the period, of future disturbances W7-z4;, and of €7_x, the 
observation error at time t = T — k. 

We first note that, from eq. (37) 


Xr-1 = o (X: = W.--1), (39) 
so that 


* This assumption is not necessary for the results of this section; however, it allows 
sufficient generality for the model considered and will simplify notation. 
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yr = HX, + er 
yr-1 = H[O Xr — 6 Wr-a] + era 
yr-2 = H[o "Xr — 6 Wr — @ Wr-2] + €7-2. 
In general, for 1 = k < T, 


k 
YT-2 = Ho “X7 —H > oeown| + €7_p. (40) 
i=1 
In matrix notation, 
Yr= (yz, +++, yi)’ = OrX7 + O72, (41) 
where 
H 
Or; = Ho" j (42) 
Ho 7*? 


Or= (87, --+, 4)’, andfor 1=k<T, 
k 
Grp =— 2 He" Wer + €7T_p. (43) 
i=1 


Equation (41) expresses the first T observations as a linear function 
of X7, plus an additive vector O7 of zero-mean, correlated noise terms. 
Correlation between the components of O7 is given by the covariance 
matrix 


V= (Ujx) = E[9797]. 
Under the set of assumptions (26) the covariances vu;,, for 7 = k, are 


given by 


jz = Cov(Or-j41, Or get) 
j-l 


a > E[He?*'Wr-i|[Ho?"'W-_i]’ 
i=] 
+ errs +1) (44) 
= H¢’’Qr-i(Ho"“)’ + dRz. (45) 


4.2.2 Minimum variance initialization estimate 


Using the linear model (41) relating the first T observations to the 
value X7 of the state vector at the end of the initialization period, we 
can estimate Xr by weighted least-squares. That is, if the matrix ®7 
defined in eq. (42) has rank n,* the minimum variance, unbiased linear 


* For the linear growth, seasonal demand model defined in Section 3.3, the matrix 
(42) has rank n if T > n, the dimension of the state vector. That is, the matrices @ and 
H define an observable linear system.? 
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estimate of Xr is given’ by 


Xor = [(@fV"'O7) OV "Yr. (46) 
The error covariance matrix, P77, is given by 
Prr = [®7V'®7]". (47) 


Using eqs. (34) and (35), we can extrapolate these quantities to time 
t= T+ 1. That is, 


Reais = oXrr (48) 
and 
Priir = oPrrd’ + Qr. (49) 


Using eq. (48) as the prior state-vector estimate at time T' + 1, we can 
process subsequent observations yr+i, yrse, «++ sequentially, as de- 
scribed in Section 4.1.1. | 

To summarize, we have shown that an initial state-vector estimate, 
satisfying the conditions of Section 4.1.1, can be obtained from the 
linear model (41) using weighted least-squares. This estimate can then 
be used to start the recursive algorithm summarized in Section 4.1.3. 

We will now discuss some general considerations regarding the 
application of these procedures to time-series forecasting. 


4.3 Implementation considerations 


So far in this section, we have shown how the weighted least-squares 
method can be used to obtain an initial state-vector estimate at time 
T, which is sequentially updated by the Kalman filter algorithm as 
new data becomes available. The application of each of these proce- 
dures is predicated on the knowledge of the second moments {Q,;} and 
{R.}, which describe the variability of the state-vector process and the 
observations, respectively. In practice, however, these quantities are 
rarely known. Therefore, before we can apply these methods to an 
observed time series, the problem of determining appropriate values 
for these parameters must be considered. 


4.3.1 Specification of Q, and R, 


Various procedures have been proposed for the on-line identification 
of the parameters {Q,;} and {R,}. (For example, see Ref. 11.) However, 
such an approach would add considerable complexity to the estimation 
procedures described in Sections 4.2 and 4.3 and, for the relatively 
short time series associated with the application considered in this 
paper, would probably yield little improvement in performance com- © 
pared to the much simpler alternative, discussed below. 

Instead of trying to estimate the parameters {R;} and {Q,} describ- 
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ing the stochastic structure of each individual series, it may be possible 
to determine a single set of parameters, say [R7'] and [Q?]}, which 
adequately approximates the true stochastic nature of the ensemble of 
time series being considered. That is, if within the domain of applica- 
tion, the performance of the procedures described in this section is 
relatively insensitive to deviations for the assumed values [Q;] and 
[Ri], then the use of a single set of parameters can be justified. Also, 
using this approach, a single gain sequence {K;} and initialization 
matrix (46) can be precomputed and applied to all series. Thus, 
implementation is greatly simplified. 


4.3.2 Truncated gain sequence 


Another simplifying approximation is available for implementing 
the Kalman filter algorithm, in which the optimal gain sequence {K,} 
is replaced by a simpler sequence, {K*}. For example, the truncated 
gain sequence, defined as 


Kriz if t=rT 
Ki, = (r => 1) (50) 
Kr, if L=T 


can often be used with good results. Two advantages of the truncated 
sequence over the full, optimal sequence are discussed below. 

First, if 7 is relatively small, only a few gain vectors {Krii, -:-, 
K7+,} need to be computed and stored. 

Second, and more important, is the use of the truncated gain 
sequence to avoid poor filter performance resulting from inaccurate 
specification of the parameters {Q,}. This is demonstrated in Fig. 2, 
which shows the theoretical performance of three seasonal trunk group 
algorithms based on the model described in Section 3.3. 

In this example, the gain sequence has been computed under the 
false assumption that Q; = 0+, when in fact, Q; > 0. Under this 
assumption, the full gain sequence {K;} converges to zero. Thus, new 
observations y; are given insufficient weight in eq. (28) to allow the 
filter to track the randomly varying process. 

Also shown in Fig. 2 is the theoretical performance of the truncated 
gain sequence (7 = 1), again computed under the false assumption that 
Q; = 0. In this case, however, the gain is held at a constant, nonzero 


+ Actually, it is sufficient to determine appropriate values for {Q;}, since the gain K, 
and initialization matrix (46) depend only on the relative magnitude of Q,; compared to 
the scalar R;. For this reason, we will hereafter assume that R, = 1. 

+ The assumption that Q, = 0 implies that the process {X,} evolves in a deterministic 
manner relative to a constant set of parameters, say Xo. In this case, the minimum- 
variance estimates coincide with the usual (fixed-parameter) least-squares estimates. 
Thus, Fig. 2 illustrates the pitfalls of using such methods, when in fact, the underlying 
process is changing with time. 
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Fig. 2—Theoretical algorithm performance. 


value, and the filter is able to track the series in a nearly optimal 
manner. 

The main point being illustrated here is that, while correct specifi- 
cation of the parameters {Q,} in the gain computation gives the filter 
sufficient responsiveness to track the process in an optimal manner, 
approximately the same effect can be achieved by truncating the gain 
sequence at some nonzero value. Thus, the truncated gain sequence is 
a useful implementation technique when a statistical description of the 
variability of the process is unavailable. 


4.4 Summary 


This section described the use of minimum variance, linear estima- 
tion procedures to estimate the parameters of a linear dynamic time- 
series model. By observing the process over a sufficiently long initial- 
ization period, an initial state-vector estimate can be obtained by the 
weighted least-squares method. Using the Kalman filter algorithm, 
this initial estimate can then be updated in an efficient, recursive 
manner as new observations become available. 

In addition, we discussed the problem of obtaining good, suboptimal 
estimates when the exact statistical structure of each individual series 
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is unknown. The application of these techniques will be illustrated in 
the next section, where we examine the performance of the proposed 
algorithm on actual trunk group data. 


V. ALGORITHM PERFORMANCE-EMPIRICAL RESULTS 


In Section III, we proposed a seasonal time-series model for trunk 
group load- histories. We then described, in Section IV, a general 
procedure for estimating the parameters of such a model and for 
predicting future values of the time series. The performance of the 
resulting forecasting procedure on actual trunk group data is studied 
in this section. 

We begin this section by describing the objectives that motivated us 
to undertake the empirical study described herein. 


5.1 Study objectives 


We wanted to use Boc trunk group data to determine the effective- 
ness of the proposed forecasting procedure in the three areas discussed 
below. 


5.1.1 Forecast accuracy 


The primary goal of the short-term trunk forecast is to obtain an 
accurate view of the approaching busy-season based on observed pre- 
busy-season traffic levels. In contrast, the yearly busy-season forecast 
(e.g., SPA) considers only the quantities of direct interest, namely, the 
time-series of yearly busy-season (peak) loads. While the short-term 
forecast has the advantage of operating with a shorter lead time and 
uses more frequent observations, the potential improvement, if any, in 
forecast accuracy over the busy-season forecast should be quantified. 


5.1.2 Predictability 


In addition to forecast accuracy, we would also like to investigate 
the degree with which service problems reveal themselves prior to the 
busy season. That is, how effective is a short-term forecast in predicting 
such events? 


5.1.3 Error characterization 


The intended application of the short-term forecast is to provide 
information regarding the adequacy of the planned trunk level for 
each trunk group. As measurements are collected and short-term 
forecasts are generated, the servicer may decide to 

(t) take no action, 

(tz) augment the group immeditately (“demand servicing’’), or 

(tii) augment the group in the near future (“anticipative demand 
servicing’’). 


SHORT-TERM FORECASTING 85 


The removal of excess trunks is the responsibility of the yearly 
planned-servicing activity recommended by the yearly trunk forecast. 
No action should be taken unless the current or anticipated level of 
demand exceeds the in-service capacity. 

The effective solution of the “anticipative demand servicing” deci- 
sion problem requires that the uncertainty regarding the short-term 
forecast be understood and quantified. That is, servicing action should 
be taken only when, with a fair degree of certainty, action is required. 


5.2 Methodology 


In principal, the quantities of interest defined above can be deter- 
mined (either analytically or by simulation) from the statistical struc- 
ture of the time series being considered. In practice, however, the true 
underlying structure and dynamics are unknown. Therefore, to obtain 
meaningful answers to the questions posed above, we must test the 
performance of our forecasting procedures on real data. 


5.2.1 The data 


Historical trunk group data is retained in computer accessible form 
by most Bocs in the extended administrative history files of the Trunk 
Servicing System (Tss). Unfortunately, the longest histories available 
today consist of only about four and one-half years of data, and are 
available in those companies that made early use of this feature. 

In the Tss system, trunk group traffic measurements are averaged 
and reported over “study periods” consisting of up to 20 business days. 
Roughly speaking, each such study period represents a rolling average 
of the four most recent weeks of valid data. For grade-of-service trunk 
groups, the busy-season corresponds to the study period (within the 
year) that has the highest offered load. This is the quantity that we 
wish to forecast. 

Although we could attempt to model, track, and predict the complete 
time series of study period loads, this level of detail is neither necessary 
nor desirable. Instead, we can partition the year into some number L 
of “forecast periods,” select the largest study period load within each 
forecast period, and work with the resulting time series of forecast 
period loads. Since the study period load series and the forecast period 
load series have the same maximum value in each year, it is sufficient 
to predict the latter. 

The choice of the parameter L, the number of forecast periods per 
year, determines a trade-off between model complexity and response 
time. For large values of L, the seasonal pattern and time-series model 
required to accurately represent it are complex; also, data must be 
collected frequently. However, since the observations occur more 
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frequently, it is possible to react more quickly to significant changes in 
the series {y;}. 

For the intended application, 1t was decided that by partitioning the 
year into eight short-term forecast periods, a reasonable balance 
between system complexity and response time would be achieved. 


5.2.2 Selection of groups 


To test the performance of the proposed algorithm in a controlled 
operating environment, it was necessary to select from the Boc data a 
subset of those trunk groups having reasonably “clean” data. That is, 
we omitted all groups whose histories exhibited one or more of the 
following characteristics. 

(t) No discernible pattern: The load histories of certain trunk 
groups appear to have neither a within-the-year pattern nor a general 
growth trend. (This frequency occurs on groups serving very small 
volumes of traffic.) For these groups, our model is inappropriate; 
trending the yearly peaks (as in SPA) appears to be the most reasonable 
approach. 

(zi) Very short histories: Because new groups begin and old groups 
leave service, the amount of available data varies among groups. We 
considered only those groups having at least four years of valid data. 

(tit) Missing data: A group was not considered if, within any of the 
32 forecast periods comprising the first four years, no valid study 
period load measurement was available. 

(tv) Deterministic changes in the series: For certain groups, it was 
apparent that a major, deterministic change (e.g., a main station 
transfer) had drastically affected the load history. In principal, such 
changes can be accommodated by the model (8); however, information 
concerning such occurrences was not available in our data base, so 
these groups were not considered. (As we mentioned in Section 2.2, 
the ability to react appropriately to such occurrences is highly desirable 
for the intended application and will be considered in future work.) 

A sample of approximately 300 trunk groups whose load histories 
satisfied conditions (i) to (tv) was selected for use in our study. 


5.2.3 The forecasts 


For each trunk group, the seasonal algorithm was initialized using 
the procedure developed in Section 4.2 and the first two years of data 
(16 data points). As we mentioned earlier, the application of both the 
minimum variance initialization and the Kalman filter algorithm re- 
quires specification of the matrix Q, describing the variability of the 
state-vector process. This quantity was left as an adjustable parameter 
in the study. 

The error covariance matrix Pr+1,r of the prior estimate Xr41,r was 
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computed from eq. (49), using the error covariance matrix (47) of the 
initialization estimate X77. Then, eq. (29) was used to compute the 
gain sequence {K71, Krio, ---}. Since we decided to use a single value 
of {Q,} for all trunk groups, the same gain sequence and initialization 
matrix could be computed once and applied to each time series. 

Although the seasonal algorithm was designed to track the entire 
time series of forecast period loads, the principal quantity of interest 
for grade-of-service trunk groups is the yearly peak (busy season) load. 
To determine the accuracy with which the seasonal algorithm predicts 
this quantity based on data available k-periods prior to the busy 
season, we compared the busy-season load with the maximum value of 
the series predicted by our algorithm. 

To compare the performance of the seasonal forecasting algorithm 
with spA, both algorithms were run in parallel on the same set of data. 
However, since the information to be used in initializing the SPA 
(aggregate growth rate estimates for the offices on which the trunk 
group terminates”) was not available in our data, an alternate proce- 
dure had to be used. 

To simulate the sPaA initial estimates based on aggregate growth rate 
information, the busy-season loads were summed over all groups in 
each of the first two years of data. The ratio of the second-year sum to 
the first-year sum was used as the aggregate growth ratio R, with 
R = 1.0325 for the ensemble of trunk groups considered in the study. 
For each group, the busy-season load in the first year was used as its 
initial level estimate, <§?, and the initial growth increment estimate 
was 


LP = (R— 129. 
5.3 Examples 


Before discussing the selection of algorithm parameters and the 
corresponding aggregate accuracy statistics, let us observe the perform- 
ance of the two forecasting procedures on a few of the trunk groups 
considered in this study. Along with the aggregate statistics, which 
quantify the average improvement in accuracy provided by seasonal 
forecasting, these examples identify a number of general cases in which 
seasonal forecasting offers dramatic improvements over the yearly 
trending method of SPA. 

Let us again consider the two example trunk group load histories 
discussed in Section 2.1. Figure 3 shows the load history of trunk group 
AA001444, except that now the results of the seasonal forecast are also 
shown. Recall that the first two years of data are used to obtain the 
initial state-vector estimate by a weighted least-squares procedure. 
The initialization is illustrated by the trajectory of circles, which 
represents an extrapolation backward in time of the state estimate at 
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Fig. 3—Trunk group AA001444—-seasonal forecast. 


the end of the initialization period. Close examination reveals that the 
trajectory more closely fits the data at the end of the period than at 
the beginning; this is consistent with the fact that we are trying to 
estimate the current state. 

The sequence of triangles beginning where the circles leave off is 
the sequence of one-step ahead seasonal forecasts obtained using the 
seasonal Kalman filter. It is evident that the seasonal algorithm is 
closely tracking the seasonal variation and underlying linear trend of 
this group and that SPA is predicting well the busy-season peaks. In 
fact, both procedures appear to perform equally well, on average, in 
predicting the yearly peak load. 

A more interesting example is given in Fig. 4, which shows the 
performance of both algorithms on trunk group AA024225. As we 
mentioned earlier, the underlying growth pattern for this group ex- 
hibits a significant change in trend, which causes both algorithms to 
overforecast the peak load in the third year. However, by observing 
subsequent off-busy-season data points, the seasonal algorithm is able 
to detect this change, quickly home-in on the “signal,” and give a 
reasonably good forecast of the busy-season load in the fourth year. In 
contrast, the yearly spA forecast continues to be far off, since at the 
time the year-four forecast 1s made, only a single data point (the year- 
three busy-season load) off the previous trend line has been observed. 
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Fig. 4—Trunk group AA024225—seasonal forecast. 


This example dramatically illustrates the potential value of the inter- 
busy-season information exploited by the seasonal algorithm. 

A third interesting example, more in line with the intended antici- 
pative demand servicing application discussed in Section 1.1, is given 
in Fig. 5. In this example, it appears that the trunk group load has 
undergone a moderate change in level and trend in the third year. The 
change occurs suddenly,* causing both algorithms to forecast signif- 
icantly low in the third year. However, after processing subsequent 
data, the seasonal algorithm is able to forecast the next busy season 
with remarkable accuracy. In contrast, the SPA forecast again falls 
significantly low. 

Finally, consider the trunk group shown in Fig. 6. In the second year, 
the busy-season load lies significantly above the trend for the other 
three years. This “outlier,” which falls within the interval used by the 
SPA algorithm to screen for outliers, is accepted as a valid data point 
and processed accordingly. This causes SPA to “overshoot” the busy 
seasons in the next two years. In contrast, the seasonal algorithm, by 
processing the inter-busy-season data, recovers quickly and provides 
accurate forecasts in the next two years. 


* Such a change, if known in advance, can be input by the forecaster, and the forecast 
can be changed accordingly. However, such “deterministic events” may be overlooked 
by the trunk forecaster, who typically has responsibility for a large number of groups. 
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Fig. 5—Trunk group AA017330—seasonal forecast. 


The significance of this fourth example hes in the additional protec- 
tion against outliers provided by the seasonal forecast. If, in this 
example, the second spA data point had been unusually low, it is likely 
that the spa algorithm would have forecast significantly low in the 
next two years, possibly resulting in demand servicing activity. In such 
a case, the seasonal algorithm could be used to predict the service 
problem prior to its realization, or to estimate the additional capacity 
needed to relieve the problem when it occurred. 

These four examples, while not constituting a meaningful statistical 
sample, dramatically illustrate a number of situations likely to occur 
in practice, where seasonal forecasting offers significant advantages 
over the yearly forecast of busy-season demand. 

Before comparing the average accuracy results obtained for the SPA 
and seasonal forecasts, we will discuss the selection of the specific 
algorithm parameters used in the study. 


5.4 Parameter selection 


Recall from Section 4.3 that the ability of the algorithm to track 
random variations in the components of the state vector can be 
enhanced by either 

(i) including a nonzero matrix Q,; that explicitly describes the 
variability of the parameters (Section 4.3.1), 


SHORT-TERM FORECASTING 91 


O TRUNK GROUP OFFERED LOAD 
A 1-STEP SEASONAL FORECAST 
® 
O 


1-YEAR SPA FORECAST 
INITIALIZATION 


OFFERED LOAD (CCS) 





1975 1976 1977 1978 1979 1980 


Fig. 6—Trunk group AA043161—seasonal forecast. 


(ti) the use of certain heuristics, such as the truncated gain se- 
quence (Section 4.3.2), or 

(itz) some combination of (z) and (iz). 

The effectiveness of these techniques was determined through ex- 
perimentation. Specifically, for each version of the algorithm that we 
tested, the steady-state forecast error variance was estimated for each 
group and summed over all groups tested. With Q; = 0, good results 
were obtained using the truncated gain sequence (50) with 7 = 1 (ie., 
the first term K7,, of the Kalman gain sequence was used to process 
each data point after initialization). Consistent with the hypothesis 
that some allowance must be made for random variation in the model 
parameters, this constant gain-vector sequence out-performed the full 
gain sequence computed under the assumption that Q, = 0. Since the 
constant gain sequence also offers certain simplifications in algorithm 
implementation, this approach seemed very attractive. 

By considering the behavior of various trunk group time series (as 
we discussed in Section 5.3.1), it also became clear that the seasonal 
forecasting algorithm should be able to follow variations in the under- 
lying linear trend (see Fig. 2b). In addition to the responsiveness 
obtained through the use of the truncated gain sequence, additional 
responsiveness in the trend parameter is obtained by including a 
positive entry gz: in the matrix Q,; = (qi;). The optimal value of this 
parameter was determined empirically. 


92 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1982 


Using the average forecast error statistics, we then compared the 
performance of the resulting seasonal algorithm with that of spa. The 
results are discussed below. 


5.5 Accuracy 


To quantitatively measure the performance of both the seasonal and 
the spa forecasting procedures, both algorithms were used to forecast 
the busy-season loads on each of the trunk groups considered in the 
study. The results were compared using the relative error statistic, 
defined below. 

Let ag be the measured busy-season trunk group load in a given 
year, and a, a forecast (obtained by either method) of this quantity. 
Then the relative forecast error is defined to be the quantity 


a a 

Ag 
To compare the approximately steady-state performance of the algo- 
rithms, this statistic was computed for each busy-season forecast in 
the fourth year. The results are shown in Fig. 7, which shows the 
distributions of the SPA and seasonal forecast errors, and in Table I. 


The latter compares the accuracy statistics of the seasonal forecasts, 
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Fig. 7—-Year 4 forecast error distributions. 
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Table I—Year 4 busy-season forecast results 


Percent 
Average ab- 

Forecast Bias solute Error rms Error 
1-Year SPA 3.5 10.3 13.5 
1-Step seasonal —0.7 5.1 7.5 
2-Step seasonal —0.8 6.1 8.5 
3-Step seasonal —0.5 6.5 9.0 
4-Step seasonal 0.1 7.0 9.4 
5-Step seasonal 0.2 7.6 10.4 
6-Step seasonal 0.4 7.9 10.9 
7-Step seasonal 0.5 8.0 10.8 
8-Step seasonal 1.8 8.4 11.6 


made one forecast period prior to the busy-season, with those of the 
SPA one-year-ahead forecast. The results show a significant improve- 
ment in accuracy, using either the average absolute relative error or 
the rms relative error. In either case, the observed error in the seasonal 
forecast is approximately half as large as that of the spa forecast. Also, 
recall that these statistics, obtained by comparing the forecast with 
the measured load, also reflect load measurement error. Thus, on 
average, the seasonal forecast is off by less than 5 percent. 

Part of this improvement must be attributed to the fact that the 
short-term busy-season forecast is made approximately 1% months 
prior to the busy season, compared with one year for spa. Additional 
improvement can be attributed to the exploitation of the additional 
structure of the seasonal time series. The relative importance of these 
two factors is illustrated in Fig. 8, which shows the relationship 
between short-term busy-season forecast error and lead time. Begin- 
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Fig. 8—Busy-season forecast error. 
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ning at 5.1 percent for the one-step forecast, the average absolute error 
statistic is a concave function of lead time, increasing to 8.4 percent 
for the one-year-ahead seasonal forecast. This suggests that most of 
the improvement in forecast accuracy can be attributed to the seasonal 
algorithm’s ability to utilize recent data. 


5.6 Summary 


In this section, we described an empirical investigation of the per- 
formance of the seasonal trunk forecasting algorithm developed in 
Section IV. We showed that, on average, a short-term (one forecast 
period ahead) forecast of the approaching busy season is approximately 
twice as accurate as the yearly spa forecast. This improvement in 
accuracy deteriorates with forecast lead time, being only 20 percent 
more accurate than sPA when projecting a full year ahead. 

More important, we were able to identify a number of situations, 
likely to occur in practice, where the seasonal forecast significantly 
outperforms sPA. These situations tend to occur where unusual data, 
such as outliers or sudden, unanticipated changes in the growth pattern 
occur. By using the information provided by the inter-busy-season 
traffic levels, the seasonal algorithm is able to recover rapidly from 
such disturbances, and provide accurate forecasts in subsequent years. 
In contrast, the SPA algorithm, which processes a single data point 
each year, takes considerably more time to recover. 


Vi. SUMMARY AND CONCLUSIONS 


This paper has described a comprehensive study of the use of 
seasonal forecasting algorithms in the trunk servicing process. 

We began by discussing the basic requirements of a short-term trunk 
forecasting system and then described a general class of time-series 
models well suited for such applications. After developing a model for 
linear growth and seasonal demand, we considered various minimum 
variance procedures for estimating the parameters of such a model 
from a given time series. By observing the series over a two-year 
initialization period, an initial estimate of the model parameters is 
obtained by weighted least-squares. Subsequent observations are proc- 
essed by an efficient recursive procedure known as the Kalman filter. 
Because of the optimality of these algorithms, a minimum amount of 
data is required before accurate forecasts can be made. 

To verify the appropriateness of these procedures on actual trunk 
group data, an empirical investigation of the performance of the 
proposed algorithm was undertaken. The main conclusions of the 
study are discussed below: 


6.1 Accuracy 


On the average, the one-step (~1% months) ahead seasonal forecast 
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of the busy-season load is approximately twice as accurate as the year- 
to-year forecast provided by the spa. Most of this improvement can be 
attributed to the ability of the seasonal algorithm to utilize effectively 
recent observations. The one-year ahead, seasonal forecast error is 
only 20 percent less than that of SPA. 


6.2 Anticipative demand servicing 


The seasonal algorithm demonstrates the ability to respond quickly 
and accurately to various situations that tend to result in inaccurate 
sPA forecasts. Thus, the seasonal algorithm represents a potentially 
valuable tool for trunk network administration. In many cases, it can 
accurately predict that the currently planned trunk level is inadequate 
for the approaching busy-season demand level, and the appropriate 
trunk group augment. Conversely, it can also be used to identify 
potentially useful reserve capacity. 


6.3 Relation to SPA 


Since current trunk provisioning methods generally require that 
planned-servicing decisions be made nearly a year in advance, these 
results lead us to the conclusion that the greater simplicity and 
generality of the linear trending method of SPA makes that algorithm 
the appropriate choice for the annual busy-season trunk forecast. In 
the future, however, new technologies may make it both possible and 
economical to maintain the network in a near-optimal configuration 
through more-frequent planned-servicing adjustments. In that case, an 
accurate, seasonal forecast may be a better alternative for planned 
servicing. : 
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This paper presents the theory for a rapidly converging adaptive 
linear digital filter. The filter weights are updated for every new 
input sample. This way the filter is optimal (in the minimum mean 
square error sense) for all past data up to the present, at all instants 
of time. This adaptive filter has thus the fastest possible rate of 
convergence. Such an adaptive filter, which is highly desirable for 
use in dynamical systems, e.g., digital equalizers, used to require on 
the order of N* multiplications for an N-tap filter at each instant of 
time. Recent “fast” algorithms have reduced this number to like 10 
N. One of these algorithms has the lattice form, and is shown here to 
have some interesting properties: It decorrelates the input data to a 
new set of orthogonal components using an adaptive, Gram-Schmidt 
like, transformation. Unlike other fast algorithms of the Kalman 
form, the filter length can be changed at any time with no need to 
restart or modify previous results. It is conjectured that these prop- 
erties will make it less sensitive to digital quantization errors in 
finite word-length implementation. 


1. INTRODUCTION 


Gradient algorithms are widely used in adaptive tapped delay-line 
filters, such as equalizers, to derive a set of tap coefficients that gives 
the desired output with a minimum mean square error (mmse). It is 
widely recognized’ that when the input samples presented to the 
adaptive system are highly correlated, convergence to the optimum 
filter coefficients is slow. An important contribution to solving this 
problem of slow convergence was made by Godard? who obtained an 
adaptive algorithm that minimizes the total mse at all instants of time. 
Consequently, the Godard algorithm has the fastest possible rate of 
convergence in an mmse sense, and is usually referred to as the optimal 
mean-square adaptive estimator. This algorithm has the structure of 
a Kalman filter, and its complexity is on the order of N” multiplications 
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and additions per iteration, where N is the number of filter coefficients 
being adjusted. Fast convergence results when successive corrections 
to the coefficients’ vector are adaptively decorrelated. Based on this 
observation, other practical, less complex, schemes of orthogonalizing 
the corrections were proposed, see for example Ref. 1. Recently, an 
efficient (or “fast’?) computing procedure, called the fast Kalman 
algorithm, was obtained which provides a fast-converging estimator 
identical to that of Godard, but which requires only on the order of 10 
N multiplications.*** 

Another approach to accelerated convergence is to transform the 
input data to obtain uncorrelated inputs to the estimator.® When the 
characteristics of the channel are fixed and known, the transformation 
can be found from the data autocorrelation matrix. When this matrix 
is unknown, the transformation has to be adaptive. Since the lattice 
structure, whose computational complexity grows only like N, is known 
to generate “white” uncorrelated outputs by a process called inverse 
filtering that keeps removing correlated components from the input 
signal,” it has been proposed for this application. However, the 
outputs of the lattice structure are uncorrelated only after it has 
converged to its steady state; therefore, it may not converge as fast as 
the Godard algorithm. Recently, Morf was able to formulate the lattice 
algorithm in a special form such that its outputs are uncorrelated in 
the mean square sense for all instants of time.’*’* Our purpose is to 
extend Morf’s works to compute an adaptive estimator which is 
equivalent in performance to Godard’s. Moreover, we will demonstrate 
that the computational complexity of the adaptive lattice algorithm 
compares well with the fast Kalman algorithm of Falconer and Ljung.° 
The advantage of the lattice structure is the ease of changing the 
number of coefficients. It is also conjectured that the lattice algorithm 
will be less sensitive than the Kalman algorithm to finite-precision 
digital implementation. Recently, this was observed in Ref. 14. It is 
also discussed in Ref. 15, where the development of an equalizer based 
on the adaptive lattice algorithm is presented in a form similar to the 
one given here. One case to illustrate this property of the lattice will 
be given at the end of this paper. 

In the next section, the optimal least mean square estimator and 
predictor are precisely defined, and the minimal error that results is 
given. In Section III, several properties of the optimal predictor are 
explored and are related to the estimation problem. In Section IV, an 
efficient (in the sense of small number of computations) lattice form is 
derived, using the relations developed in Section III, that maintains 
the optimal convergence. In Section V, the properties of this lattice 
form are compared to the steady-state lattice structure. Suggestions 
for further work are also included. 
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ll. OPTIMAL MEAN SQUARE ESTIMATION 
2.1 Notation and definitions 


Given a discrete time input data sequence {7,;} 1 = 0, 1, ---, it is 
desired to find the set of weights for a transversal tapped delay-line 
filter such that the output of this filter be a good estimate of another 
sequence {d;}. An adaptive equalizer, for example, has the received 
signal as its input, while its output should provide an estimate of the 
transmitted data. In a transversal filter, a vector of filter coefficients, 
of length p + 1, operates on vectors of data that are shifted versions of 
the input data for time 0 = ¢ = T defined by 


Yor = (yr, YT-1, **%* ; ¥T-p), (1) 


with y’ being the transpose of y and it is assumed that y; = 0 for i < 0. 
As we are concerned with an adaptive, i.e., time varying filter, its 
weight vector of order p + 1 will be denoted 


Wp,T = [w,r(0), W,,T(1), ery Wy,T(p) |. (2) 


Using these definitions, the output of the p + 1 long linear estimator 
at time T is d,7 given by 


A 


dp,T = WT Yp,T- (3) 
Now suppose that w,,r is the best predictor for time T, then the total, 
or accumulated, mean square estimation error up to time 7, when 
using this predictor, is given by 
T 


E(wp2r) = ¥) (di ~ wh.r Yad” (4) 
The sequence of weight vectors that minimizes eq. (4) at every instant 
of time T, is the most rapidly converging sequence, and is called the 
optimal adaptive filter. 
Making use of the following time-domain definitions of the cross 
correlation vector and the autocorrelation matrix, 


ik 
Y didpi = 8,7 (5) 
i=0 
T 
» Vis = Lp,T; (6) 
i=0 
one obtains 
T 
E(Wp,r) = ¥, di — 2wp,r&p,r + Wp,rRp,tWp,r. (7) 


i=0 
2.2 The optimal estimator 


Equating the gradient of eq. (7) with respect to w,,r to zero gives 
Rp,rWp,T = pT (8) 
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It is seen that solving for the optimal estimator is equivalent to 
inverting a matrix at every new sample point: 


Wp,T = Rarpr- (9) 


Godard showed that wy,r can be updated with on the order of p” 
calculations,” which is an improvement over the simple matrix inver- 
sion, requiring on the order of p* calculations. Algorithms that require 
only on the order of 10p operations for obtaining the optimal estimator 
appeared** subsequently, and are called “fast” algorithms. 

The essence of this paper is to derive the fast algorithm in a special 
form, called the lattice form. This form was proposed to speed the 
convergence of the weight vector of an adaptive predictor to its optimal 
value (see Refs. 7 to 11). As will be seen in the next paragraph, the 
estimator problem is closely related to the prediction problem. 

From eqs. (7) and (8) it is seen that the minimal total mse that 
results using the optimal estimator is simply 

T 


Hopt(Wp,7) = »y d? i Zo RpTpr- (10) 
i=0 

This is in constrast to the adaptive gradient algorithm whose perform- 
ance is more difficult to analyze. 


2.3 Optimal prediction 


The problem of prediction is more basic, but similar to the problem 
of estimation. Solving this problem will be shown to simplify the 
solution of the estimation problem. For linear prediction, a set of 
weights is used to linearly estimate the present input point from past 
values of the input data. Let the set of p weights at time T be 
{—a,,7(1), —@p,7(2), +++ , —ap,r(p)}, so that the error when predicting 
the input point y; is given by 


€pi = AbTYpi; (11) 
with 
pT = [1, Qp,r(1), re Qp,T(p)]. (12) 


The error generated in predicting the input is that part of the input 
which is uncorrelated to past values of the input. This is a desired 
feature for fast convergence of adaptive filters. 

The total square error up to time T is, thus, given by 


T 
> Opi = AprRp,rApr-. (13) 


i=0 


Taking derivatives with respect to a,,r(1) to ap,r(p), it is found that 
the predictor weight vector that will minimize the total mse up to time 
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T is the solution of the last p equations of the expression 


R¢ 
Rp,rAp,r = rag ; (14) 
with R§,r yet unknown and 0” = (0, --- , 0)’ vector of order p. Using 
this optimal predictor in eq. (18) gives 
T e 
») en: = Aib,rRptAptr = Abr ig = pT’ (15) 
i=0 


Therefore, Rj,7r is the minimal total mse that will result. 

As before, obtaining the optimal predictor A,,r for all T 1s equivalent 
to inverting the matrix R,,r for all T. An efficient algorithm for doing 
this will be described. It should be noted, from comparing eqs. (8) and 
(14), that the latter is a simpler “homogenous” set of equations, except 
for the end term R;,7; therefore, its solutions can serve as a basis for 
the solution of eq. (8). 


ll. DERIVATION OF THE ORDER AND TIME UPDATE RELATIONS 
3.1 Time shift properties of R,,r 

The vectors y,,7 for successive values of T are shifts of each other. 
As these vectors build up the matrix R,,r in eq. (6), it is expected that 
shifted versions of the solutions to the predictor and estimator equa- 
tions will serve in updating these solutions. For doing this, the shift — 
properties of R,,r are explored. For the (2, 7) term in eq. (6), we have 


T 
Rpt, J) = Y yesi-i¥rvi—-j = Rp-17(t, J) 
k=0 


for allp-12=i1,j721 (16) 
T . T-1 
Rprtit1, 7 +1) = ¥ ye-ive-j = YO Verrier 
k=0 ja 
T-1 


= »> Yk+1~i VR F p-1,T-1(, J) 
k=0 


for allp —12=1, 721, (17) 


where the fact that y; = 0 for 1 < 0 was used. Using Morf’s notation, 
these relations can conveniently be written as 


(Ropar X\_ (X X 
Rpr == (3 x) _ (i Seal (18) 


with X being any other term in the matrix. It is clear from eq. (18) that 
R,,r is symmetric, but not Toeplitz, if steady state is not reached. 
Therefore, the properties of Toeplitz matrices cannot be used, as is 
done for example in claiming fast convergence in Ref. 7. 
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3.2 Order update relations 
It will be useful to define, similar to eq. (12), a backward prediction 
vector 
Bp,r = (bp,r(P), bp,r(p — 1), +++ , bp,r(1), 1), (19) 
with the backward error given by 
l'p,T = B p.TYp,T; (20) 


1.e., it is the error in predicting yr_, from yr to yr-pii. To minimize the 
total mean square backward prediction error up to time T, B,,r should 
be the solution of : 
0? 
Rp,rBp,r = ( r ) (21) 
p,T 

It is seen that this is another set of homogenous equations, except 
for the lower one. Again, the optimal error 7,,7 will be orthogonal to 
YT—-ptl, *** 5 YT- 

A recursive procedure will be derived in the Appendix for generating 
solutions to eqs. (14) and (21) for increasing order p. 

It is shown to be 


Apii,7 = be — kprRp 7-1 (, ..) (22) 


for kp,r as defined in the Appendix, and the higher order total error is 
Roar = Rpr— kprRpr-1 = Ror —kprRprRpr-1). (23) 


As increasing the predictor order would not increase, and typically will 
decrease the error, it should be that 


| 1=k27Rp7Rpr-1 = 0. (24) 
Similarly, 
ts 0 = —e Ap,r 
Bp+1,7 = ( Dar) Rpt Rpt (5 ) (25) 
and 
Ro+ir = R51 — kprRpr. (26) 


Similar relations hold for the prediction error, when multiplying eqs. 
(22) and (25) by yp+i,7 


€p+1,T = Cp,T = Rp,rRp.r-11p,T-1 (27) 
Vp+1,T = lp,T-1 — 0, TR p,Tep,T- (28) 
The following auxiliary quantities are needed 


Ch.T = RpTyp.T (29) 
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Yp,7 = Cor yp.t = Yp,tRprypt- (30) 


The order update of these quantities will also be derived in the 
Appendix. It is shown to be 


Cpr 
= P; _ 
Cp+1,7 = (5 + Tp+1,7R p+1,7TBp+1,7 


= on" + Bp+1,7TBp41,7 = i ) (31) 
Lp+1,T 


with pip+i,r defined by 
Mp+iT = Tp+1,7 41,7 (32) 


and, as seen, is the last term of C,41,7. 


3.3 Time update relations 
To obtain the time update of A,7, use is made of the following: 
Rp,r+i = Ror + Yp,7+1¥p,741- (33) 


This relation is shown to give 
0 
'p-1,T 


Here, the definition 
ep,r+1 = Ap.ryp,T+1 (35) 


is used for the tentative prediction error before updating the prediction 
coefficients. As for the minimal total mse, it is updated according to 


e 
= t R si+1}) __ t 
Rye = Apr 3 = Ap,rRprviApr+ 


= Air(Rprt Yp,T+1Yp,7+Ap,T+1 =Rart €p,T+1€p,T+1- (36) 


It should be mentioned here that only in the stationary case e,741 = 
e>,r+i and, thus, Rp,r+1 = Rp,r + e7,7+1. As for updating B,,r, two dif- 
ferent possibilities are derived in the Appendix: 


Cr 
Bp, = Bpr — rp.T+1 (° - (37) 
or 
; 1 
By,r+1 = (Bp, — 1p,7r+1Cp,r+1) X ———g-—— (38) 
1 — Vr p,T+1pp,T+1 
From eq. (37), a relation like eq. (36) can be obtained 
Ri rv = Ror t+ rp,r+il p74: (39) 
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Again, r},r+1 is the tentative backward prediction error. The time 
update of C,,7 is also obtained in the Appendix. 
From these results, a simple update for k,,7r is found to be 


0 
ky +1 = p,T + € p,T+1lp,T- (40) 


or alternatively, from eqs. (79) and (83) 
Rp,r+i = Rpr + Cprsilp,r(1 — Yp-17) = Rp,r + ep,r4il pT. (41) 


All these relations are derived in the Appendix. They form the basis 
for the lattice network which update these quantities both in order 
and time. 


IV. EFFICIENT CALCULATION OF THE OPTIMAL ESTIMATOR 
4.1 Tapped delay line estimator 


The optimal estimator for any order p and for each time T is given 
in eq. (9). Using eqs. (5) and (6), we get 


Rpr+i Wy,T+1 = Ep,T+1 = Ep,T + Ars T+ 
p,TWy,T + Ar41YT+1 


p,T+1Wp,T + (dr41 — Wp,TYp,T+1) Yp,T+s (42) 
Therefore, 


t —1 
Wp,T+1 = Wp,T +(dr41 =. Wp, TYp,T+ Et p,T4+1Yp,TH1 
40 
= Wp,r + (dr41 — Ap,r4i1)Cp,r+1. (43) 


Note that updating w,,r involves the tentative estimate A? r4 = 
Wy,T Yp,T+1 USIng the new data and present estimator weights. This 
makes it possible to implement this scheme in decision-directed equal- 
izers, where the decision on which dr+: was transmitted is based on 
a? T+. Also note that the correction to w,,r is in the direction of Cp,7r+1 
= Rp.7r+1yp,r+1 rather than yp,7+1, as in the gradient algorithm. These 
vectors are parallel only if R,p,r+1 is a unit matrix times a scalar; thus, 
all its eigenvalues are equal. When this is not the case, and yp,rsi 
contains eigenvectors corresponding to different eigenvalues, Rp.7+1 
equalizes the gains for these vectors. Also note the similarity in the 
updating equations (34), (37), and (43) which is to be expected, since 
prediction is a special case of estimation. 
The fast Kalman algorithm is an efficient recursive procedure to 

obtain C,,7+1. This is given in Ref. 4 as follows: 

1. Assume that all vectors are available up to and including 
time 7’. 

2. Use eq. (35) to obtain ef 741 = Abr yp,T41- 


3. Use eq. (34) to calculate Apr41 = Apr — chrn( C : ) 
‘p-1,T 
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4. Use eq. (11) to calculate ep,.7+1 = Ap,r+1Yp,T+1- 
5. Use eq. (36) to calculate Ri.r41 = Ror + ep.7+1€p,741- 
6. Calculate ep.741R 5741. 
7. Use eq. (89) to calculate 
0 ~e 
Cp,T+1 = e + Cp, T+itt p.7T4+1Ap,T+1- 
'p-1,T 
8. From eq. (31), pp,r+1 1s found. 


9. Find rp,T+1 = Bier yp,T +1 
10. Use eq. (38) to calculate 


1 


= 0 
1 lr p,T+1Lbp,T+1 


11. Use eq. (73) to calculate 


C,= 
( . “a = Cp,T+1 = Up,T+1Byp,7+1. 
12. Calculate the tentative estimate do.741 = wi,rypre. 
13. Use eq. (43) to update the estimator weights 


40 
Wy,T+1 = Wy, T + (dr41 — aT+1) Cp, T+1- 


The initial conditions, when there is not enough input data so that 
R,,r in eq. (6) does not have an inverse, are discussed in Ref. 4. There 
are 10p + 5 multiplications, 9p + 4 additions, and 2 divisions for one 
complete updating cycle. Note that there are no matrix operations, 
only additions and products of scalars and vectors is involved. By 
comparison, the simple fixed-step gradient algorithm requires 2p + 1 
multiplications and 2p + 1 additions per cycle. 


4.2 Lattice structure 


Here an equivalent algorithm that also gives the optimal estimator 
with about the same number of computations is derived. It is assumed 
that the input data are transformed by the lower triangular transfor- 
mation matrix 


B B 
LT = ( a qr! eee By) (44) 


From eq. (20), the transformed data are 
Lp.T pT = (Yo,7r11,7 °° * Mp,T)* = Tp,T- (45) 


Note that as the dimension p increases, new terms are included in 7, 
but the previous ones are unchanged. This is an important property of 
the lattice algorithm that enables us to change the estimation order 
without the need to recalculate all previous values, as for example in 
the fast Kalman algorithm. As before, we let the tentative transformed 
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vector be 
Fp.r+1 = Lp,ryp,r+i- (46) 
Define a new vector of weights 
Hp,r = (ho,rhi,r +++ hp,r), (47) 
that now operates on 7;,r to give the estimate 


dp,r = Hp,rrp,r = Hp,rLp,ryp,r- (48) 


It is seen that for this estimate to be equivalent to the optimal 
estimator, w,,r is the transform of H,,7, and from eq. (8) 


Lp, &p,7 = Lp,rRp,rWp,r = Lp,rRp,rLp,rHp,r. (49) 
Using eq. (21), it can be shown that 
Rir 0 0? 
R,,rLp,7 = Rir «+ (50) 
xX xX Rpt 


which is a lower triangular matrix. The product L,,7R,,rL;,r is, thus, 
a symmetric product of two lower triangular matrices; therefore, it 
should be symmetric, lower triangular, and diagonal, 1.e., 


LprRprL>,r = D,r. (51) 
The diagonal terms are easily found using eq. (50) 


0! 
D,,r(i, i) = [Bir(0"™)'] [a | = Rir (52) 
xX 


and, again, they are independent of p. 

At this point, a closer inspection of L,,r is of interest. It is a lower 
triangular matrix with 1 on the main diagonal. Therefore, L;,7 has the 
same structure. Therefore, eq. (45) can be rewritten as 


rp,T = ¥p,T — (Lor = Ip) p,r- (53) 


This can be looked upon as a Gram-Schmidt procedure to calculate 
new orthogonal components of 7,,7 from the components of y,,7, minus 
their projections on the previous 7,,r components. Thus, eq. (51) 
represents the fact that the autocorrelation matrix of the transformed 
data is indeed diagonal. 

Using eqs. (49) and (51), H,,r can be found from the transformed 
&p.r by 


Apr = Dp,rLp,7 &p,t . (54) 


It should be noted that only scalar divisions rather than matrix 
inversion is needed here and increasing the order of the lattice 
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estimator does not change previously calculated values of H. This is 
why double indices are used in eq. (47) as compared to triple indices 
in eq. (2). Equation (54) can be broken to p scalar equations 


hp,r = Rpt Bpr8p,r- (55) 


To see how the right-hand side develops in time, define 


Pp,r = Bp,r&p,r- (56) 
Then from eqs. (87), (5), (29), (9), and (3) in that order 
Pp,T+1 = Bp,T+18p,T+1 
= [Bir — rpr+i(Cp-1,7+10) |( gp,7 + r+1yp,7+1) 
= Ppt t+ (dr41 — dp-1.7+l).T41 = Ppt + Vp-1,7+11 pT+1; (57) 
where V,,r is the estimation error after the pth order estimator. For 
p=0 
P0,T+1 = £0,T+1 = por + Ars+i r+, (58) 
ie., d-1.7r+1 = 0. Obviously, 
V,r = dr — dpr = dr — Hirrp.r = Vp-17r — ptr pr- (59) 
The recursive solution to eq. (55) that corresponds to eq. (43) is: 
hyo = Rpr+i(pp,r + Vp—-1,7+11 p,T+)) 
= Roel (Ro rs — p,r+il p+ Apr + Vp-1,7+1"p.T+1] 
=hpr t+ Rprai(Vp-1741 — App.) p,7+1- (60) 


This is equivalent to the first tap of the conventional tapped delay line 
equalizer for p = 0 only. The tentative estimate as in eq. (43) is now 


A 


0 t t t <0, 
Ones = Wy,T ptt = Hp,rLp,ryp,t+1 = Apt? p,r41- (61) 


The minimal total squared error is from eq. (51) 


T T 
E,r= Y, d?- gprRpr&pr = ¥ di — (Lp,r&p,r)‘Dpr\Lp,rfp,7)- (62) 
i=0 


i=0 
From the structure of D and L it follows that 
Ep+it = Ep,r — (Bo+i,r gps.) Rptr = Ep — ppsi,rR ptr. (63) 


Using eq. (57), the residual error can be found for all instants of time. 
It is then simple to decide whether p should be increased, decreased, 
or unchanged to meet the desired performance. As mentioned before, 
when adding or deleting sections no recalculation of the coefficients is 
needed. 

The procedure for recursively obtaining the estimator H,,r and the 
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estimate d,,7+1 is as follows: 

1. Assume that all quantities are known up to and including 
time 7’. 

2. Start with ep.7r+1 = ep r41 = Mp,r+i = prs = Yrui for p = 0. 

3. Use eqs. (22) and (25) to compute 


eLrsi = 0,741 — (korR OTS 0,7 
rire = 10,7 — (ko,rRoy)eo,r+1- 
4, Use eq. (26) to compute ho,r+1 = Ror + €0.7+11 0,7. 
5. Use eqs. (36) and (39) to compute 
Rérui = Ror + €0,74+1€0,7+1 
Ror+i = Ror + Yo,T+190,T+1- 
6. Compute the gain terms ko7+1Ro7 Ro,r41Ro741 to obtain from 
eqs. (27) and (28) 
€1,7+1 = €0,7+1 — (ko,7+1R0,7)ro,r 
ry74+1 = Yor — (ko,7r41Ro74+1)€0,74+1.- 


These gain terms can be saved for the next recursion. 
7. Repeat steps 3 to 6 for p = 1, 2, --- 
8. Use eq. (61) to compute the tentative estimate 


aA 


0 = t =0 
Ap,T+1 = p,TT p,T+1:+ 


9. To update H,,r start with V_1,7r+1 = drs: from eqs. (58) and (59) 
and use eq. (60) to compute | 


= 0 
ho,r+1 _ hor + FR 0,7+1( V-1,741 = ho,rro,r+)r 0,7+1- 


10. Use eq. (59) to compute 
Vo,r+1 = V_1741 = ho,r+110,7+1- 


11. Repeat steps 9 and 10 for p = 1, 2, ---. 

12. Use eq. (48) to compute dp,r41 = Hp7+17 p74: 
Steps 3, 6, 8, and 10 can be drawn in a block diagram like in Fig. 1. The 
variable gain terms are k,7rRp7-1, Rp,rR5,7, and hy,r, and they are 
updated in steps 4, 6, and 9. When the system reaches a steady state, 
it can be illustrated in a simpler form as shown in Fig. 2. 


4.2.1 Starting the algorithm 


The problem of finding the optimal predictor/estimator of order p 
is not well defined if there are less than p input points. Therefore, 
when starting the algorithm, p should be 0 in the first recursion, 1 in 
the second, and p should grow linearly in time until it reaches the 
desired number of sections of the lattice. This is in contrast with the 
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Fig. 1—The basic form of the lattice estimator. 


fast Kalman algorithm, where a small diagonal matrix is assumed for 
Ryo. 


4.2.2 Number of operations 


The number of operations required for the three algorithms, the 
simple gradient, the fast Kalman, and the lattice, are given below 
where p is the number of adaptive parameters: 


: Multiplica- = oe 
Algorithm tions Additions Divisions 
Gradient 2p 2p — 
Fast Kalman 10p 9p 2 
Lattice 12p llp 3p 


V. DISCUSSION 


It was shown in eq. (9) that the optimal linear estimator that yields 
the least total mse is obtained by matrix inversion. A recursive algo- 
rithm to update the optimal estimator also involves an inversion of 
the correlation matrix of the data as in eq. (43). If the input data are 
uncorrelated (i.e., low signal embedded in flat noise, or data signal with 
Nyquist spectral shape), then multiplying by Rp! is equivalent to 
scalar division, which is the simple gradient algorithm. However, if the 
data are highly correlated and R,,7 has its eigenvalues spread out 
(Amax/Amin >> 1), then the optimal recursive algorithm for the fastest 
convergence of the estimator is more complex: The estimator can still 
have the form of a tapped delay line, but now the shift properties of 
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Fig. 2—The basic form of the steady state lattice estimator. 


R,,r are used to update w,7 in 0(p) multiplications. A different 
approach demonstrated in this paper is to transform the input data 
using the lattice network to get uncorrelated inputs to the estimator. 
The estimator weights are now different but related to the original set 
through the same transformation eq. (48). This is similar to Ref. 6, 
except that the transformation matrix is time variant. The performance 
of the nonstationary lattice and the fast Kalman are the same—both 
give the minimal error—and in Ref. 2 it is demonstrated that conver- 
gence time can be reduced by a factor of 15, compared to the simple 
gradient algorithm. 

The differences between the lattice and the fast Kalman algorithms 
in practical, finite precision digital implementation should be fully 
discussed elsewhere. However, an example can be given here. Exam- 
ining step 5 for the Kalman algorithm, eq. (36) may occasionally render 
R>,r+1 which is nonpositive because of the accumulation of arithmetic 
errors. The algorithm is useless from that time.on, and has to be 
restarted. On the other hand, for the lattice algorithm, step 5, if either 
Rj,r+1 or R5.7r+1 become nonpositive, force them to be some small but 
positive number, and make all k;,7+1 equal zero for i = p. The updating 
algorithm then falls back to the gradient algorithm from tap p and on, 
or the filter length can be shortened to length p, as desired. 

Future work should try and make use of the above recurrence update 
relations for the exponentially weighted errors, the ‘fading memory” 
case under time-varying situations. Also simpler, suboptimal algo- 
rithms can be derived and should be compared to the exact algorithm 
in terms of performance and complexity. 
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VI. CONCLUSIONS 


(t) The lattice algorithm gives identical results to the fast Kalman 
algorithm for adapting filter coefficients when both have the same 
number of coefficients. 

(iz) The number of multiplications for the two algorithms is about 
the same, but the lattice requires more divisions for normalization by 
the residual error energy at each stage. 

(tit) Changing the number of taps is easier under the lattice algo- 
rithm. 

(tv) In hmited-precision implementation under severe amplitude 
distorting channels, the last property of the lattice algorithm may be 
valuable in providing better performance. 


APPENDIX 


A.1 Derivation of the order update of A,,r 
From eqs. (14), (18), and (20), it is found that 


Rpt 
ton“) =( a)(4)-(F) 
kp 


and similarly 
0 how 
Rp+1,7 ( ) = 0” (65) 
ee Rp T-1 


for some kp,7,k;,r. From the fact that A,,7r(0) = B,,r-1(p) = 1 it can be 
shown that 


Rp A 
Rpt = (0 Bir-1) 0? = (0 By 7-1) RptiT ( 4 
kyr 
a , P\tpr Apr _ Le 
= ( p,T(0 )'Rp,7T-1) ( 0 ) — " p,T- (66) 


Combining egs. (64) and (65) in a proper way, it is found that 


A e 
Rp+1,7 ( , — kp R pr-1 ( 2) = Va), (67) 


Rpr-1 = (Rj,7-1) (68) 


with 


Therefore, 
_ [Apr _ ap 0 
Ap+1,7 = ( 0 ) ky,rR 57-1 ( coat (69) 
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A.2 The order update of C,.r 
For the order update of C,,r, note that 


Rp+1,7 es = «3 (70) 


Q?t! 
Rp+1,7Bp+1,7 = ( : ). (71) 
p+1,T 


and 


From this it can be seen that Cp+:,7 is a linear combination of es 


and Bpitr. 
From the relation 


((0°**)'R 41,7) Cp+7 = BosirRp+7Cp+.7 


= B p+1,TYp+1,T = p+1.T) (72) 
it then must be that 
C ae 
Cp+1,7 = ( ?7 + rp4i7R p41,7Bp+i,7 
C. X 
= ( a ot Up+1,7B 41,7 = ( ) (73) 
Up+1,T 
with the definition 
Bp+1,7 = Tp41,TR 41,7; (74) 


L.€., fp+i,r is the last term of Cp+1,7. 
Therefore, 


_ yt = 2. x 
Yp+1,T = Cp41,7Yp+1,T = yer t+ 1p+17R 41,7; (75) 


and, thus, 


P 
vet = > rirRir. (76) 


i=0 


A.3 The time update relations 
From eq. (33) the time update of A,,r is obtained as follows: 


Rp,t+1Ap,t = (Rp,r + Yp,T+1Y p,T+iAp,T 


Ror 
= ( oP + Yp,T+1€ p,T+1 


e 0 
- (*5") a eee ( re ) eee (77) 
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for some R§,7+1, with the definition (35) for e},7+1, From eq. (77), it can 
be seen that 


Ap, T+1 = £1p,T — e4,T+1 ( : ) (78) 


Cp-1,7 
and multiplying both sides by y,,7r+1 gives the relation 
€p,r+1 = Cp,T+1(1 — Yp-1,7) for p=1, 2, +>: (79) 


As for p = 0, we get from the definition that 


0 ze 
€o,7+1 = @o07+1 OY y-17r= 0. 


The time update of B,,7 is obtained similarly: 


0? 
Ry r+iBp,r = (R,,7 + Vo,T+1Y p,T+) Bp,T — & ; + Yp,T+il p,T+1 


= ( el + Rp 41 “wo ry, T+1- (80) 
Therefore, 
By,rv1 = Bor — 1 p,rv1 ‘em, (81) 
As in eqs. (36), (78), and (79), 
Rpr+i = Ror + rprsilpT+1 (82) 
and | 
rpT+1 =1p,r+i(l — Yp-1r+1) for p=1,2,--- (83) 
and 


Yo,T+1 = rO,T+1 = t+1- 
The time update of B,,r can also be obtained as follows: 


Rp,r+1 Bar = (Rp, + Vp,T+1 ¥>,7T+1)B pT 
0? 0 0? 0 
= r + Yp,T+1 p,T+1 = r + Rp,r+1 Cor+il p,T+1- (84) 
Rp Rpt | 


Thus, 
1 


0 ’ 
l= Pp,T+1Ep,T+1 


(85) 


Bp r+ = (B,,7 as rpT+i Cp,T+1) xX 


where the denominator is chosen to make 0,,7+:(0) = 1 using the 
definition (32). The time update of C,,7r is obtained as in eqs. (70) to 


(73). 
0 Xx 
Rp,+1 (c...) = (,*.) (86) 
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and 


Rp,r+1Ap,r+1 = Pa (87) 


Therefore, C,,7+1 1s a linear combination of 
0 
Cp-1,T 


[Rf,7+1(0")'\C, 741 = Ap,r+iRp,r+iCp,r+1 = Ab,T+1Yp,T+1 = €p,T+1; (88) 


it is found that 


and A, 7+:. From 


0 2 
Cp, T-+1 aT a ; + €p,7+1 Rp r+iAprs+t. (89) 
Multiplying by yp.7+1 gives 
Yp,T+1 = Yp-1,T + enr+iR prt (90) 
and | 
P 
Yp,T+1 = > Ci T+i+1-pR ET siei=p: (91) 


i= 


For the time update of K,,r, use the definitions (64) and (65): 


Rp,T+1 =[k +1(0?) Rp 7] © 


A 
= (0 Bhr)(Rp+1.7 + Ypti,7+1Y pti,T+) ( 57) 


0 
Rp 
= (0 Bir) | 0? |)+rprepri=Rprteprsulpr (92) 
kyr 
Using eqs. (79) and (83) an alternative form is 
Rpr+i = Rp. + €brsilp,t(l — Yp-1,7) = Rp + Cp,rilpt- (93) 
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